Converting multi index data frame to nested dictionaries

in Python I have the following multi index data frame:

			level 3	col 1	col 1	col 2	col 3
level 1	level 2	level 3	level 4	col 1a	col 1b	col 2	col 3
A	Aa	Aaa	type1	val 1	val 2	val 3	val 4
A	Aa	Aaa	type2	val 5	val 6	val 7	val 8
A	Ab	Aab	type1	val 9	val 10	...	...
A	Ac	Aac	type2	...	...	...	...
B	Ba	Baa	type1	...	...	...	...
B	Ba	Bab	type3	...	...	...	...
C	Ca	Caa	type4	...	...	...	...

To explain the data frame:

First 2 rows are multi index headers
First 4 columns are multi index headers
level 1, 2, 3, 4 and empty spaces are index area

From this list, my aim is to get a dictionary such as:

First option is A , B or C
If clicked on A, next choice is Aa , Ab or Ac
If clicked on Aa, next choice is only Aaa (if only Aaa is available)
If clicked on Aaa, next choice is type 1 or type 2
If clicked on type1, next choice is col 1 or col 2 or col3
If clicked on col1, next choice is col 1a or col 1b
If clicked on col 1a, we find the val 1 as final output

Please, give me directions on how to do this transformation. Also, new viewpoints and suggestions about how to approach this case is very appreciated as well.

CodePudding user response：

Suppose you have the following dataframe df:

                 col 1           col 2   col 3
                col 1a  col 1b   col 2   col 3
A Aa Aaa type1   val 1   val 2   val 3   val 4
         type2   val 5   val 6   val 7   val 8
  Ab Aab type1   val 9  val 10  val 11  val 12
  Ac Aac type2  val 13  val 14  val 15  val 16
B Ba Baa type1  val 17  val 18  val 19  val 20
     Bab type3  val 21  val 22  val 23  val 24
C Ca Caa type4  val 25  val 26  val 27  val 28

Step 1: Further flattening:

df_sub = (
    df.loc[:, [("col 1", "col 1a"), ("col 1", "col 1b")]]
      .melt(ignore_index=False)
      .set_index(["variable_0", "variable_1"], append=True)
)

Result:

                                       value
               variable_0 variable_1        
A Aa Aaa type1 col 1      col 1a       val 1
         type2 col 1      col 1a       val 5
  Ab Aab type1 col 1      col 1a       val 9
  Ac Aac type2 col 1      col 1a      val 13
B Ba Baa type1 col 1      col 1a      val 17
     Bab type3 col 1      col 1a      val 21
C Ca Caa type4 col 1      col 1a      val 25
A Aa Aaa type1 col 1      col 1b       val 2
         type2 col 1      col 1b       val 6
  Ab Aab type1 col 1      col 1b      val 10
  Ac Aac type2 col 1      col 1b      val 14
B Ba Baa type1 col 1      col 1b      val 18
     Bab type3 col 1      col 1b      val 22
C Ca Caa type4 col 1      col 1b      val 26

Step 2: Nesting (recursive):

def nest(df):
    if df.index.nlevels == 1:
        return df.to_dict()[df.columns[0]]
    return {
        key: nest(df_grp.droplevel(0, axis=0))
        for key, df_grp in df.groupby(level=0)
    }

result = nest(df_sub)

Result:

{'A': {'Aa': {'Aaa': {'type1': {'col 1': {'col 1a': 'val 1',
                                          'col 1b': 'val 2'}},
                      'type2': {'col 1': {'col 1a': 'val 5',
                                          'col 1b': 'val 6'}}}},
       'Ab': {'Aab': {'type1': {'col 1': {'col 1a': 'val 9',
                                          'col 1b': 'val 10'}}}},
       'Ac': {'Aac': {'type2': {'col 1': {'col 1a': 'val 13',
                                          'col 1b': 'val 14'}}}}},
 'B': {'Ba': {'Baa': {'type1': {'col 1': {'col 1a': 'val 17',
                                          'col 1b': 'val 18'}}},
              'Bab': {'type3': {'col 1': {'col 1a': 'val 21',
                                          'col 1b': 'val 22'}}}}},
 'C': {'Ca': {'Caa': {'type4': {'col 1': {'col 1a': 'val 25',
                                          'col 1b': 'val 26'}}}}}}

Is that what you are looking for?

The 2. step could be done differently:

result = {}
for keys, value in df_sub.itertuples():
    last = result
    for key in keys[:-1]:
        last = last.setdefault(key, {})
    last[keys[-1]] = value