TranslateProject/sources/tech/20220512 sqlite-utils- a nice way to import data into SQLite for analysis.md
DarkSun 3870370f50 选题[tech]: 20220512 sqlite-utils: a nice way to import data into SQLite for analysis
sources/tech/20220512 sqlite-utils- a nice way to import data into SQLite for analysis.md
2022-05-14 05:02:33 +08:00

4.5 KiB
Raw Permalink Blame History

sqlite-utils: a nice way to import data into SQLite for analysis

Hello! This is a quick post about a nice tool I found recently called sqlite-utils, from the tools category.

Recently I wanted to do some basic data analysis using data from my Shopify store. So I figured Id query the Shopify API and import my data into SQLite, and then I could make queries to get the graphs I want.

But this seemed like a lot of boring work, like Id have to write a schema and write a Python program. So I hunted around for a solution, and I found sqlite-utils, a tool designed to make it easy to import arbitrary data into SQLite to do data analysis on the data.

sqlite-utils automatically generates a schema

The Shopify data has about a billion fields and I really did not want to type out a schema for it. sqlite-utils solves this problem: if I have an array of JSON orders, I can create a new SQLite table with that data in it like this:


    import sqlite_utils

    orders = ... # (some code to get the `orders` array here)

    db = sqlite_utils.Database('orders.db')
    db['shopify_orders'].insert_all(orders)

you can alter the schema if there are new fields (with alter)

Next, I ran into a problem where on the 5th page of downloads, the JSON contained a new field that I hadnt seen before.

Luckily, sqlite-utils thought of that: theres an alter flag which will update the tables schema to include the new fields. ```

Heres what the code for that looks like


    db['shopify_orders'].insert_all(orders, alter=True)

you can deduplicate existing rows (with upsert)

Next I ran into a problem where sometimes when doing a sync, Id download data from the API where some of it was new and some wasnt.

So I wanted to do an “upsert” where it only created new rows if the item didnt already exist. sqlite-utils also thought of this, and theres an upsert method.

For this to work you have to specify the primary key. For me that was pk="id". Heres what my final code looks like:


    db['shopify_orders'].upsert_all(
        orders,
        pk="id",
        alter=True
    )

theres also a command line tool

Ive talked about using sqlite-utils as a library so far, but theres also a command line tool which is really useful.

For example, this inserts the data from a plants.csv into a plants table:


    sqlite-utils insert plants.db plants plants.csv --csv

format conversions

I havent tried this yet, but heres a cool example from the help docs of how you can do format conversions, like converting a string to a float:


    sqlite-utils insert plants.db plants plants.csv --csv --convert '
    return {
      "name": row["name"].upper(),
      "latitude": float(row["latitude"]),
      "longitude": float(row["longitude"]),
    }'

This seems really useful for CSVs, where by default itll often interpret numeric data as strings if you dont do this conversions.

metabase seems nice too

Once I had all the data in SQLite, I needed a way to draw graphs with it. I wanted some dashboards, so I ended up using Metabase, an open source business intelligence tool. I found it very straightforward and it seems like a really easy way to turn SQL queries into graphs.

This whole setup (sqlite-utils + metabase + SQL) feels a lot easier to use than my previous setup, where I had a custom Flask website that used plotly and pandas to draw graphs.

thats all!

I was really delighted by sqlite-utils, it was super easy to use and it did everything I wanted.


via: https://jvns.ca/blog/2022/05/12/sqlite-utils--a-nice-way-to-import-data-into-sqlite/

作者:Julia Evans 选题:lujun9972 译者:译者ID 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出