split pandas rows into columns and comma separate duplicates-CodePudding

I have a df similar to the one below where it has a second columns of categories. I want each category to be its own column like in the second example below.

df:

col1        col2         col3 
sally       grade        nine
joe         grade        ten
mary        age         eight
sue         age         eight
john        height      5'9
john        age         twelve
john      fav_subject    math
john      fav_subject   english

desired:

col1      grade   age    height   fav_subject
sally       nine  
Joe         ten
mary               eight
sue                eight
john               twelve    5'9    math, english

CodePudding user response：

Use pivot_table:

In [596]: out = df.pivot_table(index='col1', columns='col2', values='col3', aggfunc=','.join, fill_value='')

Out[600]: 
col2      age   fav_subject grade height
col1                                    
joe                           ten       
john   twelve  math,english          5'9
mary    eight                           
sally                        nine       
sue     eight

CodePudding user response：

Here is a way using groupby(). By setting sort = False, this will make the output match the row and column order that you have your expected output df in.

(df.groupby(['col1','col2'],sort=False)['col3']
.agg(','.join)
.unstack(level=1)
.rename_axis(None,axis=1)
.reset_index())

Output:

    col1 grade     age height   fav_subject
0  sally  nine     NaN    NaN           NaN
1    joe   ten     NaN    NaN           NaN
2   mary   NaN   eight    NaN           NaN
3    sue   NaN   eight    NaN           NaN
4   john   NaN  twelve    5'9  math,english