Home > Blockchain >  Group data in pivot Pandas
Group data in pivot Pandas

Time:08-12

hello I made a table with the help of a pandas, then I created a pivot with two indexes., I would like to group this data like that, but one index was the header of another. Below I will show what is happening in the tables and what I would like the result Table:

Name Lang Skill Corp
Michael java 2 Google
Piter C 3 Facebook
Cristiano python 5 Google
Michael java 1 Facebook
Piter C 2 Google
Cristiano python 3 Facebook
Michael java 4 Google
Piter C 5 Facebook
Cristiano python 1 Google
Michael python 2 Facebook

I used:

pivot = pd.pivot_table(df, values="Skill", index=["Corp", "Name"], columns = "Lang", aggfunc="sum")

and I have pivot:

Corp Name C java python
Facebook Cristiano nan nan 3
Facebook Michael nan 1 2
Facebook Piter 8 nan nan
Google Cristiano nan nan 6
Google Michael nan 6 nan
Google Piter 2 nan nan

the result I would like:

Name C java python
Facebook sum_fb sum_fb sum_fb
Cristiano nan nan 3
Michael nan 1 2
Piter 8 nan nan
Google sum_google sum_google sum_google
Cristiano nan nan 6
Michael nan 6 nan
Piter 2 nan nan

Thank You in advance

CodePudding user response:

You can aggregate sum by Corp level, which is first by GroupBy.sum, append index to MultiIndex with same values:

df1 = (pivot.groupby(level=0).sum()
            .assign(Name = lambda x: x.index)
            .set_index('Name', append=True))

Or use level=[0,0] for MultiIndex, only necessary set names by DataFrame.rename_axis:

df1 = pivot.groupby(level=[0,0]).sum().rename_axis(['Corp','Name'])
print (df1)         
Lang               C    java  python
Corp     Name                       
Facebook Facebook  8.0   1.0     5.0
Google   Google    2.0   6.0     6.0

Then is appended pivot DataFrame by concat, but is necessary sorting for correct ordering by first level of MultiIndex with DataFrame.sort_index, remove first level Corp by DataFrame.droplevel and last convert Name to column with remove columns name Lang by DataFrame.rename_axis:

df = (pd.concat([df1, pivot])
        .sort_index(level=0, sort_remaining=False)
        .droplevel(0)
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
        Name  C    java  python
0   Facebook  8.0   1.0     5.0
1  Cristiano  NaN   NaN     3.0
2    Michael  NaN   1.0     2.0
3      Piter  8.0   NaN     NaN
4     Google  2.0   6.0     6.0
5  Cristiano  NaN   NaN     6.0
6    Michael  NaN   6.0     NaN
7      Piter  2.0   NaN     NaN
  • Related