I hope I can describe the problem clearly. I have a pandas df with 2 columns (date, person) that looks like this:
date person
0 2002-09-04 [{'name':'anna', 'weight':'2.9', 'hospital':'x'}, {'name': 'jacob', ...}, ...]
1 2002-10-16 [{'name':'lynn', 'weight':'3.0', 'hospital':'y'}, {'name': 'tony', ...}, ...]
2 ...
3
and want to turn it into this:
date name weight hospital
0 2002-09-04 anna 2.9 x
1 2002-09-04 jacob ... ...
.
.
.
n 2002-10-16 lynn 3.0 y
n1 2002-10-16 tony ... ...
Originally there are more columns with important information. Therefore I want to create a new df to make the data in 'person' more accessible. I tried appending the dict lists row by row.
So far I have:
df_person = pd.DataFrame()
for row, _ in enumerate(df['person']):
df_person = df_person.append(df['person'][row], ignore_index = True, sort = False)
This gives me the df with name, weight and hospital as columns. But how can I now keep the right dates for each entry?
CodePudding user response:
You can explode
your list of dicionaries and convert them to column with pandas.json_normalize
:
out = (df
.explode('person', ignore_index=True)
.pipe(lambda d: d.join(pd.json_normalize(d.pop('person'))))
)
output:
date name weight hospital
0 2002-09-04 anna 2.9 x
1 2002-09-04 jacob NaN NaN
2 2002-10-16 lynn 3.0 y
3 2002-10-16 tony NaN NaN