From a list of dataclasses
from dataclasses import dataclass
@dataclass
class Row:
name: str
age: int
hobbies: pd.DataFrame
charles_hobbies = pd.DataFrame({'activities': ['video_game'], 'sports': ['tennis']})
dash_hobbies = pd.DataFrame({'activities': ['eat'], 'sports': ['soccer']})
rows = []
rows.append(Row(name='Charles', age=24, hobbies=charles_hobbies))
rows.append(Row(name='Dash', age=18, hobbies=dash_hobbies))
print(pd.DataFrame(rows))
name age hobbies
0 Charles 24 activities sports 0 video_game tennis
1 Dash 18 activities sports 0 eat soccer
desired output (doesn't have to have the -
).
out = pd.DataFrame([['Charles', 24, 'video_game', 'tennis'], ['Dash', 18, 'eat', 'soccer']])
out.columns = [['name', 'age', 'hobbies', 'hobbies'], ['-', '-', 'activity', 'sport']]
print(out)
name age hobbies
- - activity sport
0 Charles 24 video_game tennis
1 Dash 18 eat soccer
There is only 1 row in each nested dataframe, so it technically can be expanded where the columns of the nested dataframe is the column of the dataframe without doing something more complicated, however, I can't think of a neat way to do this other than breaking the dataclass apart and reconstructing the dataframe by joining the hobbies dataframe with another dataframe containing name
and age
.
CodePudding user response:
I think unpacking is your best bet:
# gather all the data
out = pd.DataFrame([{'name': r.name, 'age':r.age, **r.hobbies.iloc[0]} for r in rows])
# rename columns
out.columns = pd.MultiIndex.from_tuples([(x, '-') for x in out.columns[:2]]
[('hobbies',x) for x in out.columns[2:]]
)
Output:
name age hobbies
- - activities sports
0 Charles 24 video_game tennis
1 Dash 18 eat soccer