in Python I have the following multi index data frame:
level 3 | col 1 | col 1 | col 2 | col 3 | |||
---|---|---|---|---|---|---|---|
level 1 | level 2 | level 3 | level 4 | col 1a | col 1b | col 2 | col 3 |
A | Aa | Aaa | type1 | val 1 | val 2 | val 3 | val 4 |
A | Aa | Aaa | type2 | val 5 | val 6 | val 7 | val 8 |
A | Ab | Aab | type1 | val 9 | val 10 | ... | ... |
A | Ac | Aac | type2 | ... | ... | ... | ... |
B | Ba | Baa | type1 | ... | ... | ... | ... |
B | Ba | Bab | type3 | ... | ... | ... | ... |
C | Ca | Caa | type4 | ... | ... | ... | ... |
To explain the data frame:
- First 2 rows are multi index headers
- First 4 columns are multi index headers
- level 1, 2, 3, 4 and empty spaces are index area
From this list, my aim is to get a dictionary such as:
- First option is A , B or C
- If clicked on A, next choice is Aa , Ab or Ac
- If clicked on Aa, next choice is only Aaa (if only Aaa is available)
- If clicked on Aaa, next choice is type 1 or type 2
- If clicked on type1, next choice is col 1 or col 2 or col3
- If clicked on col1, next choice is col 1a or col 1b
- If clicked on col 1a, we find the val 1 as final output
Please, give me directions on how to do this transformation. Also, new viewpoints and suggestions about how to approach this case is very appreciated as well.
CodePudding user response:
Suppose you have the following dataframe df
:
col 1 col 2 col 3
col 1a col 1b col 2 col 3
A Aa Aaa type1 val 1 val 2 val 3 val 4
type2 val 5 val 6 val 7 val 8
Ab Aab type1 val 9 val 10 val 11 val 12
Ac Aac type2 val 13 val 14 val 15 val 16
B Ba Baa type1 val 17 val 18 val 19 val 20
Bab type3 val 21 val 22 val 23 val 24
C Ca Caa type4 val 25 val 26 val 27 val 28
Step 1: Further flattening:
df_sub = (
df.loc[:, [("col 1", "col 1a"), ("col 1", "col 1b")]]
.melt(ignore_index=False)
.set_index(["variable_0", "variable_1"], append=True)
)
Result:
value
variable_0 variable_1
A Aa Aaa type1 col 1 col 1a val 1
type2 col 1 col 1a val 5
Ab Aab type1 col 1 col 1a val 9
Ac Aac type2 col 1 col 1a val 13
B Ba Baa type1 col 1 col 1a val 17
Bab type3 col 1 col 1a val 21
C Ca Caa type4 col 1 col 1a val 25
A Aa Aaa type1 col 1 col 1b val 2
type2 col 1 col 1b val 6
Ab Aab type1 col 1 col 1b val 10
Ac Aac type2 col 1 col 1b val 14
B Ba Baa type1 col 1 col 1b val 18
Bab type3 col 1 col 1b val 22
C Ca Caa type4 col 1 col 1b val 26
Step 2: Nesting (recursive):
def nest(df):
if df.index.nlevels == 1:
return df.to_dict()[df.columns[0]]
return {
key: nest(df_grp.droplevel(0, axis=0))
for key, df_grp in df.groupby(level=0)
}
result = nest(df_sub)
Result:
{'A': {'Aa': {'Aaa': {'type1': {'col 1': {'col 1a': 'val 1',
'col 1b': 'val 2'}},
'type2': {'col 1': {'col 1a': 'val 5',
'col 1b': 'val 6'}}}},
'Ab': {'Aab': {'type1': {'col 1': {'col 1a': 'val 9',
'col 1b': 'val 10'}}}},
'Ac': {'Aac': {'type2': {'col 1': {'col 1a': 'val 13',
'col 1b': 'val 14'}}}}},
'B': {'Ba': {'Baa': {'type1': {'col 1': {'col 1a': 'val 17',
'col 1b': 'val 18'}}},
'Bab': {'type3': {'col 1': {'col 1a': 'val 21',
'col 1b': 'val 22'}}}}},
'C': {'Ca': {'Caa': {'type4': {'col 1': {'col 1a': 'val 25',
'col 1b': 'val 26'}}}}}}
Is that what you are looking for?
The 2. step could be done differently:
result = {}
for keys, value in df_sub.itertuples():
last = result
for key in keys[:-1]:
last = last.setdefault(key, {})
last[keys[-1]] = value