I have a df similar to the one below where it has a second columns of categories. I want each category to be its own column like in the second example below.
df:
col1 col2 col3
sally grade nine
joe grade ten
mary age eight
sue age eight
john height 5'9
john age twelve
john fav_subject math
john fav_subject english
desired:
col1 grade age height fav_subject
sally nine
Joe ten
mary eight
sue eight
john twelve 5'9 math, english
CodePudding user response:
Use pivot_table
:
In [596]: out = df.pivot_table(index='col1', columns='col2', values='col3', aggfunc=','.join, fill_value='')
Out[600]:
col2 age fav_subject grade height
col1
joe ten
john twelve math,english 5'9
mary eight
sally nine
sue eight
CodePudding user response:
Here is a way using groupby()
. By setting sort = False
, this will make the output match the row and column order that you have your expected output df
in.
(df.groupby(['col1','col2'],sort=False)['col3']
.agg(','.join)
.unstack(level=1)
.rename_axis(None,axis=1)
.reset_index())
Output:
col1 grade age height fav_subject
0 sally nine NaN NaN NaN
1 joe ten NaN NaN NaN
2 mary NaN eight NaN NaN
3 sue NaN eight NaN NaN
4 john NaN twelve 5'9 math,english