I have a pandas dataframe (sample) as follows
df = pd.DataFrame({'Country':['India', 'China', 'Nepal'],
'Habitat':[[{'city1':'Ind1','city2':'Ind2'},{'town1':'IndT1','town2':'IndT2'}],
[{'city1':'Chi1','city2':'Chi2'},{'town1':'ChiT1','town2':'ChiT2'}],
[{'city1':'Nep1','city2':'Nep2'},{'town1':'NepT1','town2':'NepT2'}]],
'num':[1,2,3]
})
df
Country Habitat num
0 India [{'city1':'Ind1','city2':'Ind2'},{'town1':'IndT1','town2':'IndT2'}] 1
1 China [{'city1':'Chi1','city2':'Chi2'},{'town1':'ChiT1','town2':'ChiT2'}] 2
2 Nepal [{'city1':'Nep1','city2':'Nep2'},{'town1':'NepT1','town2':'NepT2'}] 3
I need to flatten this out in this format.
result_df = pd.DataFrame({'Country':['India', 'China', 'Nepal'],
'Habitat.city1':['Ind1','Chi1','Nep1'],
'Habitat.city2':['Ind2','Chi2','Nep2'],
'Habitat.town1':['IndT1','ChiT1','NepT1'],
'Habitat.town2':['IndT2','ChiT2','NepT2'],
'num':[1,2,3]
})
result_df
Country Habitat.city1 Habitat.city2 Habitat.town1 Habitat.town2 num
India Ind1 Ind2 IndT1 IndT2 1
China Chi1 Chi2 ChiT1 ChiT2 2
Nepal Nep1 Nep2 NepT1 NepT2 3
I have tried pd.json_normalize(df.explode('Habitat')['Habitat])
but it creates new rows which I do not need.
My observation:
Some form of groupby and transpose that can properly build on pd.json_normalize(df.explode('Habitat')['Habitat])
could solve my problem but so far I have not had any luck
CodePudding user response:
Let us use ChainMap
to merge the list of dictionaries in each row, then create a new dataframe and join
back with original dataframe
from itertools import starmap
from collections import ChainMap
h = pd.DataFrame(starmap(ChainMap, df['Habitat']), df.index)
df.join(h.add_prefix('Habitat.'))
Country Habitat num Habitat.city1 Habitat.city2 Habitat.town1 Habitat.town2
0 India [{'city1': 'Ind1', 'city2': 'Ind2'}, {'town1': 'IndT1', 'town2': 'IndT2'}] 1 Ind1 Ind2 IndT1 IndT2
1 China [{'city1': 'Chi1', 'city2': 'Chi2'}, {'town1': 'ChiT1', 'town2': 'ChiT2'}] 2 Chi1 Chi2 ChiT1 ChiT2
2 Nepal [{'city1': 'Nep1', 'city2': 'Nep2'}, {'town1': 'NepT1', 'town2': 'NepT2'}] 3 Nep1 Nep2 NepT1 NepT2
CodePudding user response:
In Python 3.9 you can use the dictionary union, as follows:
import pandas as pd
from operator import or_
from itertools import starmap
flat = pd.DataFrame(starmap(or_, df['Habitat']), df.index).add_prefix('Habitat.')
res = pd.concat([df.drop(labels=['Habitat'], axis=1), flat], axis=1)
print(res)
Output
Country num Habitat.city1 Habitat.city2 Habitat.town1 Habitat.town2
0 India 1 Ind1 Ind2 IndT1 IndT2
1 China 2 Chi1 Chi2 ChiT1 ChiT2
2 Nepal 3 Nep1 Nep2 NepT1 NepT2
The operator or_
will call the |
implementation of the objects under-the-hood, from the documentation:
Return the bitwise or of a and b.
For the case of dictionaries the |
is the union. For more information on how to merge two dictionaries see this answer.
An alternative solution that works for any number of dictionaries, is to use functools.reduce
:
import pandas as pd
from operator import or_
from functools import reduce, partial
merge = partial(reduce, or_)
flat = pd.DataFrame(map(merge, df['Habitat']), df.index).add_prefix('Habitat.')
res = pd.concat([df.drop(labels=['Habitat'], axis=1), flat], axis=1)