I have a data frame that looks like this:
a = {'price': [1, 2],
'nested_column':
[[{'key': 'code', 'value': 'A', 'label': 'rif1'},
{'key': 'datemod', 'value': '31/09/2022', 'label': 'mod'}],
[{'key': 'code', 'value': 'B', 'label': 'rif2'},
{'key': 'datemod', 'value': '31/08/2022', 'label': 'mod'}]]}
df = pd.DataFrame(data=a)
My expected output should look like this:
b = {'price': [1, 2],
'code':["A","B"],
'datemod':["31/09/2022","31/08/2022"]}
exp_df = pd.DataFrame(data=b)
I tried some lines of code, that unfortunately don't do the job, that look like this:
df = pd.concat([df.drop(['nested_column'], axis=1), df['nested_column'].apply(pd.Series)], axis=1)
df = pd.concat([df.drop([0], axis=1), df[0].apply(pd.Series)], axis=1)
CodePudding user response:
You can pop
and explode
your column to feed to json_normalize
, then pivot
according to the desired key/value and join
:
# pop the json column and explode to rows
s = df.pop('nested_column').explode()
df = df.join(pd.json_normalize(s) # normalize dictionary to columns
.assign(idx=s.index) # ensure same index
.pivot(index='idx', columns='key', values='value')
)
output:
price code datemod
0 1 A 31/09/2022
1 2 B 31/08/2022
CodePudding user response:
Get key
: value
pairs from nested dictionaries and flatten values by json_normalize
:
f = lambda x: {y['key']:y['value'] for y in x for k, v in y.items()}
df['nested_column'] = df['nested_column'].apply(f)
print (df)
price nested_column
0 1 {'code': 'A', 'datemod': '31/09/2022'}
1 2 {'code': 'B', 'datemod': '31/08/2022'}
df1 = df.join(pd.json_normalize(df.pop('nested_column')))
print (df1)
price code datemod
0 1 A 31/09/2022
1 2 B 31/08/2022
CodePudding user response:
A more pythonic approach. I create dictionary b
from a
. I am adding the values to the variable that correspond with the key.
n = len(a['nested_column'])
m = len(a['nested_column'][0])
b = {}
b['price'] = a['price']
for var in ['code', 'datemod']:
b[var] = [a['nested_column'][i][j]['value'] for i in range(n) for j in range(m) if a['nested_column'][i][j]['key'] == var]
CodePudding user response:
I'm a fan of doing operations such as this outside of Pandas, primarily for speed - can't argue that @mozway's solution is pleasing to the eye though :)
Export df
to dictionary
mapping = df.to_dict('records')
Iterate through the dictionary to create a defaultdict dictionary
from collections import defaultdict
out = defaultdict(list)
for entry in mapping:
for key, value in entry.items():
if key == 'price':
out[key].append(value)
else:
for ent in value:
if ent['key'] == "code":
out["code"].append(ent["value"])
else:
out["datemod"].append(ent["value"])
pd.DataFrame(out)
price code datemod
0 1 A 31/09/2022
1 2 B 31/08/2022
You could reduce the number of trips by iterating through a
directly (or exporting df as df.to_dict('list')
):
from itertools import chain
out = defaultdict(list)
for key, value in a.items():
if key == "price":
out[key].extend(value)
else:
value = chain.from_iterable(value)
for ent in value:
if ent['key'] == 'code':
out['code'].append(ent['value'])
else:
out['datemod'].append(ent['value'])
pd.DataFrame(out)
price code datemod
0 1 A 31/09/2022
1 2 B 31/08/2022