Home > Software engineering >  How to flatten double-nested json file in Pandas
How to flatten double-nested json file in Pandas

Time:11-03

I need to convert a json file to a dataframe however that json file have columns where there are nested data. I was able to successfully convert columns having one layer of nest using pd.json_normalize(data=df['column'])

{'ts_tc': '2021-10-21T21:50:05.121Z', 'line_of_business': 'kiosk', 'consent_status': 'active'}

My problem is I have another column of list of dictionaries having double nested data in the form of array where the epd is another dictionary.

[{'uid': 'abcd', 'sid': '1234', 'sc': 'false', 'epd': '{"value1":"66.72","value2":"66.72123"}'}, {'uid': 'abcd', 'sid': '1234', 'epd': '{"value1":"69.72","value2":"69.72123"}'}]

When I used pd.json_normalize I got 'list' object has no attribute 'values' error message. I also tried flatten_json library and it didn't work for me. How do I flatten the dataframe completely so that I have:

uid sid sc value1 value2
abcd 1234 false 66.72 66.72123
abcd 1234 69.72 69.72123

CodePudding user response:

If the keys of the epd field are subject to change, you're safer going line by line:

import pandas as pd
import json

json_data = [{'uid': 'abcd', 'sid': '1234', 'sc': 'false', 'epd': '{"value1":"66.72","value2":"66.72123"}'}, {'uid': 'abcd', 'sid': '1234', 'epd': '{"value1":"69.72","value2":"69.72123"}'}]

df = pd.DataFrame(json_data)
    
for i in range(len(df)):
    nested_data = json.loads(df.at[i, 'epd'])
    for key in nested_data:
        df.at[i, key] = nested_data[key]
del df['epd']

print(df)

Output:

    uid   sid     sc value1    value2
0  abcd  1234  false  66.72  66.72123
1  abcd  1234    NaN  69.72  69.72123
  • Related