How to flatten double-nested json file in Pandas-CodePudding

I need to convert a json file to a dataframe however that json file have columns where there are nested data. I was able to successfully convert columns having one layer of nest using pd.json_normalize(data=df['column'])

{'ts_tc': '2021-10-21T21:50:05.121Z', 'line_of_business': 'kiosk', 'consent_status': 'active'}

My problem is I have another column of list of dictionaries having double nested data in the form of array where the epd is another dictionary.

[{'uid': 'abcd', 'sid': '1234', 'sc': 'false', 'epd': '{"value1":"66.72","value2":"66.72123"}'}, {'uid': 'abcd', 'sid': '1234', 'epd': '{"value1":"69.72","value2":"69.72123"}'}]

When I used pd.json_normalize I got 'list' object has no attribute 'values' error message. I also tried flatten_json library and it didn't work for me. How do I flatten the dataframe completely so that I have:

uid	sid	sc	value1	value2
abcd	1234	false	66.72	66.72123
abcd	1234		69.72	69.72123

CodePudding user response：

If the keys of the epd field are subject to change, you're safer going line by line:

import pandas as pd
import json

json_data = [{'uid': 'abcd', 'sid': '1234', 'sc': 'false', 'epd': '{"value1":"66.72","value2":"66.72123"}'}, {'uid': 'abcd', 'sid': '1234', 'epd': '{"value1":"69.72","value2":"69.72123"}'}]

df = pd.DataFrame(json_data)
    
for i in range(len(df)):
    nested_data = json.loads(df.at[i, 'epd'])
    for key in nested_data:
        df.at[i, key] = nested_data[key]
del df['epd']

print(df)

Output:

    uid   sid     sc value1    value2
0  abcd  1234  false  66.72  66.72123
1  abcd  1234    NaN  69.72  69.72123