Home > Software design >  convert a nested dataframe to multiindex
convert a nested dataframe to multiindex

Time:11-03

From a list of dataclasses

from dataclasses import dataclass

@dataclass
class Row:
    name: str
    age: int
    hobbies: pd.DataFrame

charles_hobbies = pd.DataFrame({'activities': ['video_game'], 'sports': ['tennis']})
dash_hobbies = pd.DataFrame({'activities': ['eat'], 'sports': ['soccer']})
rows = []
rows.append(Row(name='Charles', age=24, hobbies=charles_hobbies))
rows.append(Row(name='Dash', age=18, hobbies=dash_hobbies))
print(pd.DataFrame(rows))

    name    age hobbies
0   Charles 24  activities sports 0 video_game tennis
1   Dash    18  activities sports 0 eat soccer

desired output (doesn't have to have the -).

out = pd.DataFrame([['Charles', 24, 'video_game', 'tennis'], ['Dash', 18, 'eat', 'soccer']])
out.columns = [['name', 'age', 'hobbies', 'hobbies'], ['-', '-', 'activity', 'sport']]
print(out)
    name    age hobbies
    -       -   activity    sport
0   Charles 24  video_game  tennis
1   Dash    18  eat         soccer

There is only 1 row in each nested dataframe, so it technically can be expanded where the columns of the nested dataframe is the column of the dataframe without doing something more complicated, however, I can't think of a neat way to do this other than breaking the dataclass apart and reconstructing the dataframe by joining the hobbies dataframe with another dataframe containing name and age.

CodePudding user response:

I think unpacking is your best bet:

# gather all the data
out = pd.DataFrame([{'name': r.name, 'age':r.age, **r.hobbies.iloc[0]} for r in rows])

# rename columns
out.columns = pd.MultiIndex.from_tuples([(x, '-') for x in out.columns[:2]]   
                                        [('hobbies',x) for x in out.columns[2:]]
                                       )

Output:

      name age     hobbies        
         -   -  activities  sports
0  Charles  24  video_game  tennis
1     Dash  18         eat  soccer
  • Related