Home > front end >  Convert deep nested dictionary to pandas dataframe
Convert deep nested dictionary to pandas dataframe

Time:10-18

Let's take the following dictionary as an example

d = {
    'A': {
        'a1': {
            'col1': 1,
            'col2': 2
        },
        'a2': {
            'col1': 3,
            'col2': 4
        }        
    },
    'B': {
        'b1': {
            'col1': 5,
            'col2': 6
        },
        'b2': {
            'col1': 7,
            'col2': 8
        }        
    }    
}

I would like to convert that to a pandas DataFrame which should look like this

first   second  col1  col2 
--------------------------
A       a1      1     2
        a2      3     4
B       b1      5     6
        b2      7     8

Can anyone help on how to do that?

CodePudding user response:

One can use pandas.DataFrame.from_dict as follows

df_new = pd.DataFrame.from_dict({(i,j): d[i][j]
                                for i in d.keys()
                                for j in d[i].keys()},
                            orient='index').rename_axis(['first', 'second'])

[Out]:

              col1  col2
first second            
A     a1         1     2
      a2         3     4
B     b1         5     6
      b2         7     8

CodePudding user response:

You can flatten values to tuples and then use DataFrame.pivot:

df = (pd.DataFrame([(k, k1, k2, v2) for k, v in d.items() 
                                    for k1, v1 in v.items() 
                                    for k2, v2 in v1.items()])
        .pivot(index=[0,1], columns=2, values=3)
        .rename_axis(index=['first','second'], columns=None))
print (df)
              col1  col2
first second            
A     a1         1     2
      a2         3     4
B     b1         5     6
      b2         7     8

Alternative solution with defaultdict:

from collections import defaultdict

d1 = defaultdict(dict)

for k, v in d.items():
    for k1, v1 in v.items():
        for k2, v2 in v1.items():
            d1[k2].update({(k, k1): v2})

df = pd.DataFrame(d1).rename_axis(index=['first','second'])
print(df)
              col1  col2
first second            
A     a1         1     2
      a2         3     4
B     b1         5     6
      b2         7     8
  • Related