Home > Back-end >  How to convert a nested dictionary with lists to a dataframe in this format
How to convert a nested dictionary with lists to a dataframe in this format

Time:04-12

What can I do to convert a nested dictionary like this example:

nested_dict = { 'Girl': {'June': [45, 32], 'Samantha': [14, 34, 65]},
                'Boy': {'Brad': [12, 54, 12], 'Chad': [12]}}

to a dataframe of this format?:

Col1      Col2       Col3  
June      Girl       45
June      Girl       32
Samantha  Girl       14
Samantha  Girl       34
Samantha  Girl       65
Brad      Boy        12
Brad      Boy        54
Brad      Boy        12
Chad      Boy        12

failed attempt:

df = pd.DataFrame.from_dict(nested_dict, orient='index')

df
Out[17]: 
           June      Samantha          Brad  Chad
Girls  [45, 32]  [14, 34, 65]           NaN   NaN
Boys        NaN           NaN  [12, 54, 12]  [12]

CodePudding user response:

You can use stack and explode:

import pandas as pd

nested_dict = { 'Girl': {'June': [45, 32], 'Samantha': [14, 34, 65]},
                'Boy': {'Brad': [12, 54, 12], 'Chad': [12]}}

df = pd.DataFrame.from_dict(nested_dict, orient='index')
print(df.stack().explode())

Output:

Girl  June        45
      June        32
      Samantha    14
      Samantha    34
      Samantha    65
Boy   Brad        12
      Brad        54
      Brad        12
      Chad        12

CodePudding user response:

You can't use the option orient="index" because that converts each dictionary key into a single row, but your original dictionary is more complexly structured.

You have to manually reprocess the dictionary before converting it because the original data you have are normalized - they is nested according to shared data values. The DataFrame object you want to create is explicitly NOT normalized.

Here is some code to de-normalize your data so it can be converted:

nested_dict = {
   'Girl': { 'June': [45, 32], 'Samantha': [14, 34, 65] },
   'Boy': { 'Brad': [12, 54, 12], 'Chad': [12] }
}

metaDF = { 'Col1' : [], 'Col2' : [], 'Col3' : [] }

for gender in nested_dict:
   for person in nested_dict[gender]:
      for age in nested_dict[gender][person]:
         metaDF['Col1'].append( person )
         metaDF['Col2'].append( gender )
         metaDF['Col3'].append( age )

df = pd.DataFrame.from_dict( metaDF, orient="columns" )

Reference:

  • Related