i have dataframe like this:
id date value
1 2021-09-01 [{'value': 'Keyman', 'sum': 8932.199999999999}
{'value': 'Suluboya', 'sum': 5896.7499999999945}]
2 2021-10-01 [{'value': 'Atelier', 'sum': 5595.299999999992}
{'value': 'Atölye4D', 'sum': 3538.7299999999987}]
expected output:
idx date value sum
1 2021-09-01 Keyman 8932.199999999999
2 2021-09-01 Suluboya 5896.7499999999945
3 2021-10-01 Atelier 5595.299999999992
4 2021-10-01 Atölye4D 3538.7299999999987
i tried:
df = pd.json_normalize(df['value'][0])
this was for the first row only. It works but there is a lot of data than that and I believe there is a shorter way. I tried this too.
def only_dict(d):
return ast.literal_eval(d) if pd.notnull(d) else {}
dfy = json_normalize(df['value'].apply(only_dict).tolist()).add_prefix('_')
but i am getting this error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
CodePudding user response:
Assuming your df to be:
In [577]: df = pd.DataFrame({'id':[1,2], 'date':['2021-09-01', '2021-10-01'], 'value':[[{'value': 'Keyman', 'sum': 8932.199999999999},{'value': 'Suluboya', 'sum': 5896.7499999999945}], [{'value': 'Atelier', 'sum': 5595.299
...: 999999992},{'value': 'Atölye4D', 'sum': 3538.7299999999987}]]})
In [578]: df
Out[578]:
id date value
0 1 2021-09-01 [{'value': 'Keyman', 'sum': 8932.199999999999}...
1 2 2021-10-01 [{'value': 'Atelier', 'sum': 5595.299999999992...
You can use df.explode
with df.apply
:
In [581]: df1 = df.explode('value')
In [585]: df1 = pd.concat([df1['date'], df1.value.apply(pd.Series)], 1)
In [586]: df1
Out[586]:
date value sum
0 2021-09-01 Keyman 8932.20
0 2021-09-01 Suluboya 5896.75
1 2021-10-01 Atelier 5595.30
1 2021-10-01 Atölye4D 3538.73
OR use pd.json_normalize
:
In [587]: df1 = df.explode('value')
In [597]: df1 = pd.concat([df1['date'].reset_index(drop=True), pd.json_normalize(df1['value'])], 1)
In [598]: df1
Out[598]:
date value sum
0 2021-09-01 Keyman 8932.20
1 2021-09-01 Suluboya 5896.75
2 2021-10-01 Atelier 5595.30
3 2021-10-01 Atölye4D 3538.73