Home > front end >  Use Pandas to write JSON list to separate rows to sqlite3 db
Use Pandas to write JSON list to separate rows to sqlite3 db

Time:11-29

Imagine JSON that looks like this:

[{"name": "foo", 'attr': ['bar', 'baz']}, {...}]

I can easily read this into a DataFrame, no problem. (Python 3.9.2, Pandas 1.5.1)

If I just try it, I get sqlite3.InterfaceError: Error binding parameter 2 - probably unsupported type., so it dies on the list type.

What I want to do is use to_sql() or something like it write to a normalized table, e.g.,

name attr
foo  bar
foo  baz

Is there an easy way of doing this? Also, I have full control over the production of the JSON, so if there is a better JSON format for Pandas, I can reshape the input file.

CodePudding user response:

If I have understood you correctly, you have multiple ways of achieving your goal.

Method 1: Using your current JSON structure

First taking your approach, you have dict like this:

my_json = [{"name": "foo1", 'attr': ['bar1', 'baz1']}, {"name": "foo2", 'attr': ['bar2', 'baz2']}]

and you create a pandas.DataFrame object liket this:

import pandas as pd
my_df = pd.DataFrame(my_json)

which gives you following:

   name          attr
0  foo1  [bar1, baz1]
1  foo2  [bar2, baz2]

now since attr column contains list, which is a object, it can not be exported as such to sqlite db. So we are going to have to either explode this column to have my_df in desired form. For this, we apply following function:

my_df = my_df.explode('attr').reset_index(drop=True)

Now our Dataframe looks like this:

   name  attr
0  foo1  bar1
1  foo1  baz1
2  foo2  bar2
3  foo2  baz2

all you need to do now is to export it to sqlite. For this we need an engine. I use sqlachemy to create one

my_engine = sqlalchemy.create_engine('sqlite:///test.db')
my_df.to_sql("test_table_1", my_engine, if_exists="replace", index=False)

And now we have a table.

Method 2: Different JSON Structure

Since you said you have control over structure of data, I personally like doing as less computing on service side as possible. So I think following JSON structure would be the best:

my_json2 = [
    {"name": "foo1", 'attr': 'bar1'},
    {"name": "foo1", 'attr': 'baz1'},
    {"name": "foo2", 'attr': 'bar2'},
    {"name": "foo2", 'attr': 'baz2'},
]

Downside of this is, the JSON size (i.e. number of lines) increases. But loading into df is straight forward.

my_df2 = pd.DataFrame(my_json2)

which gives me:

   name  attr
0  foo1  bar1
1  foo1  baz1
2  foo2  bar2
3  foo2  baz2

Now we can export in similar way as method 1,

my_df2.to_sql('test_table_2', my_engine, if_exists='replace', index=False)
  • Related