I need to convert a json file to a dataframe however that json file have columns where there are nested data. I was able to successfully convert columns having one layer of nest using pd.json_normalize(data=df['column'])
{'ts_tc': '2021-10-21T21:50:05.121Z', 'line_of_business': 'kiosk', 'consent_status': 'active'}
My problem is I have another column of list of dictionaries having double nested data in the form of array where the epd is another dictionary.
[{'uid': 'abcd', 'sid': '1234', 'sc': 'false', 'epd': '{"value1":"66.72","value2":"66.72123"}'}, {'uid': 'abcd', 'sid': '1234', 'epd': '{"value1":"69.72","value2":"69.72123"}'}]
When I used pd.json_normalize I got 'list' object has no attribute 'values' error message. I also tried flatten_json library and it didn't work for me. How do I flatten the dataframe completely so that I have:
uid | sid | sc | value1 | value2 |
---|---|---|---|---|
abcd | 1234 | false | 66.72 | 66.72123 |
abcd | 1234 | 69.72 | 69.72123 |
CodePudding user response:
If the keys of the epd field are subject to change, you're safer going line by line:
import pandas as pd
import json
json_data = [{'uid': 'abcd', 'sid': '1234', 'sc': 'false', 'epd': '{"value1":"66.72","value2":"66.72123"}'}, {'uid': 'abcd', 'sid': '1234', 'epd': '{"value1":"69.72","value2":"69.72123"}'}]
df = pd.DataFrame(json_data)
for i in range(len(df)):
nested_data = json.loads(df.at[i, 'epd'])
for key in nested_data:
df.at[i, key] = nested_data[key]
del df['epd']
print(df)
Output:
uid sid sc value1 value2
0 abcd 1234 false 66.72 66.72123
1 abcd 1234 NaN 69.72 69.72123