Convert nested dictionary to multilevel column dataframe-CodePudding

I have a dictionary which I want to convert to multilevel column dataframe and the index will be the most outer keys of the dictionary.

my_dict = {'key1': {'sub-key1': {'sub-sub-key1':'a','sub-sub-key2':'b'}, 'sub-key2': {'sub-sub-key1':'aa','sub-sub-key2':'bb'}},
    'key2': {'sub-key1': {'sub-sub-key1':'c','sub-sub-key2':'d'}, 'sub-key2': {'sub-sub-key1':'cc','sub-sub-key2':'dd'}}}

My desired output should look like:

               sub-key1                        sub-key2
    sub-sub-key1    sub-sub-key2     sub-sub-key1    sub-sub-key2
key1    a               b                aa               bb
key2    c               d                cc               dd

I tried to use concat with pd.concat({k: pd.DataFrame.from_dict(my_dict, orient='index') for k, v in d.items()}, axis=1) but the result is not as expected.

I also tried to reform the dictionary.

reformed_dict = {}
for outerKey, innerDict in my_dict.items():
    for innerKey, values in innerDict.items():
        reformed_dict[(outerKey, innerKey)] = values
pd.DataFrame(reformed_dict)

Again the result was not ok. The highest level column and index are interchanged.

Is there any other way to do this?

CodePudding user response：

You were pretty close with concat, need to unstack after so like

res = pd.concat({k: pd.DataFrame.from_dict(v, orient='columns') 
                 for k, v in my_dict.items()}
         ).unstack()
print(res)
#          sub-key1                  sub-key2             
#      sub-sub-key1 sub-sub-key2 sub-sub-key1 sub-sub-key2
# key1            a            b           aa           bb
# key2            c            d           cc           dd

CodePudding user response：

Try this one-liner that uses pd.concat, dict comphrension and pd.from_dict to format the dataframe, and pd.unstack to adjust the dataframe structure.

df = pd.concat({k: pd.DataFrame.from_dict(v) for k, v in my_dict.items()}).unstack()

result:

              sub-key1                        sub-key2
    sub-sub-key1    sub-sub-key2    sub-sub-key1    sub-sub-key2
key1    a                 b              aa                bb
key2    c                 d              cc                dd

CodePudding user response：

First reform your dictionary:

>>> reform = {(outerKey, innerKey): values for index, outerDict in my_dict.items() for outerKey, innerDict in outerDict.items() for innerKey, values in innerDict.items()}
>>> reform
{('sub-key1', 'sub-sub-key1'): 'c',
 ('sub-key1', 'sub-sub-key2'): 'd',
 ('sub-key2', 'sub-sub-key1'): 'cc',
 ('sub-key2', 'sub-sub-key2'): 'dd'}

Then make a dataframe with the right index names:

>>> df = pd.DataFrame(reform, index=d.keys())
>>> df
         sub-key1                  sub-key2             
     sub-sub-key1 sub-sub-key2 sub-sub-key1 sub-sub-key2
key1            c            d           cc           dd
key2            c            d           cc           dd