I fetch data from MySQL database. The fetched data has a column with nested lists and dictionaries. This json is similar to the stored data in my database:
my_dict = {'id': [1, 2, 3],
'b': [{'100': [{'p': 10, 'max': 20, 'min': 15},
{'p': 20, 'max': 30, 'min': 20}]
},
{'101': [{'p': 100, 'max': 200, 'min': 150}],
'102': [{'p': 105, 'max': 205, 'min': 155},
{'p': 102, 'max': 202, 'min': 152}]},
{'103': [{'p': 210, 'max': 2110, 'min': 1115}]}]}
and in code, I have df
only:
df = pd.DataFrame(my_dict)
df
id b
0 1 {'100': [{'p': 10, 'max': 20, 'min': 15}, {'p': 20, 'max': 30, 'min': 20}]}
1 2 {'101': [{'p': 100, 'max': 200, 'min': 150}], '102': [{'p': 105, 'max': 205, 'min': 155}, {'p': 102, 'max': 202, 'min': 152}]}
2 3 {'103': [{'p': 210, 'max': 2110, 'min': 1115}]}
Now, I want to flat the column b
like the following:
df
id key p max min
0 1 100 10 20 15
1 1 100 20 30 20
2 2 101 100 200 150
3 2 102 105 205 155
4 2 102 102 202 152
5 3 103 210 2120 1115
I read about the explode
and pd.json_normalize
. But they did not help. What is the most efficient solution for this problem?
CodePudding user response:
Use a nested list comprehension:
pd.DataFrame([{'id': i, 'key': k, 'p': i, **x}
for i, d in zip(my_dict['id'], my_dict['b'])
for k, l in d.items() for x in l])
Output:
id key p max min
0 1 100 10 20 15
1 1 100 20 30 20
2 2 101 100 200 150
3 2 102 105 205 155
4 2 102 102 202 152
5 3 103 210 2110 1115
CodePudding user response:
You can pre-process the dictionary then explode
and json_normalize
the dataframe
my_dict['b'] = [[d_ | {'key': k} for k, lst in d.items() for d_ in lst]
for d in my_dict['b']]
df = (pd.DataFrame(my_dict).explode('b', ignore_index=True)
.pipe(lambda d: d.join(pd.json_normalize(d.pop('b')))))
pprint(my_dict) # from pprint import pprint
{'b': [[{'key': '100', 'max': 20, 'min': 15, 'p': 10},
{'key': '100', 'max': 30, 'min': 20, 'p': 20}],
[{'key': '101', 'max': 200, 'min': 150, 'p': 100},
{'key': '102', 'max': 205, 'min': 155, 'p': 105},
{'key': '102', 'max': 202, 'min': 152, 'p': 102}],
[{'key': '103', 'max': 2110, 'min': 1115, 'p': 210}]],
'id': [1, 2, 3]}
print(pd.DataFrame(my_dict))
id b
0 1 [{'p': 10, 'max': 20, 'min': 15, 'key': '100'}...
1 2 [{'p': 100, 'max': 200, 'min': 150, 'key': '10...
2 3 [{'p': 210, 'max': 2110, 'min': 1115, 'key': '...
print(df)
id p max min key
0 1 10 20 15 100
1 1 20 30 20 100
2 2 100 200 150 101
3 2 105 205 155 102
4 2 102 202 152 102
5 3 210 2110 1115 103