hello I made a table with the help of a pandas, then I created a pivot with two indexes., I would like to group this data like that, but one index was the header of another. Below I will show what is happening in the tables and what I would like the result Table:
Name | Lang | Skill | Corp |
---|---|---|---|
Michael | java | 2 | |
Piter | C | 3 | |
Cristiano | python | 5 | |
Michael | java | 1 | |
Piter | C | 2 | |
Cristiano | python | 3 | |
Michael | java | 4 | |
Piter | C | 5 | |
Cristiano | python | 1 | |
Michael | python | 2 |
I used:
pivot = pd.pivot_table(df, values="Skill", index=["Corp", "Name"], columns = "Lang", aggfunc="sum")
and I have pivot:
Corp | Name | C | java | python |
---|---|---|---|---|
Cristiano | nan | nan | 3 | |
Michael | nan | 1 | 2 | |
Piter | 8 | nan | nan | |
Cristiano | nan | nan | 6 | |
Michael | nan | 6 | nan | |
Piter | 2 | nan | nan |
the result I would like:
Name | C | java | python |
---|---|---|---|
sum_fb | sum_fb | sum_fb | |
Cristiano | nan | nan | 3 |
Michael | nan | 1 | 2 |
Piter | 8 | nan | nan |
sum_google | sum_google | sum_google | |
Cristiano | nan | nan | 6 |
Michael | nan | 6 | nan |
Piter | 2 | nan | nan |
Thank You in advance
CodePudding user response:
You can aggregate sum by Corp
level, which is first by GroupBy.sum
, append index to MultiIndex
with same values:
df1 = (pivot.groupby(level=0).sum()
.assign(Name = lambda x: x.index)
.set_index('Name', append=True))
Or use level=[0,0]
for MultiIndex, only necessary set names by DataFrame.rename_axis
:
df1 = pivot.groupby(level=[0,0]).sum().rename_axis(['Corp','Name'])
print (df1)
Lang C java python
Corp Name
Facebook Facebook 8.0 1.0 5.0
Google Google 2.0 6.0 6.0
Then is appended pivot
DataFrame by concat
, but is necessary sorting for correct ordering by first level of MultiIndex with DataFrame.sort_index
, remove first level Corp
by DataFrame.droplevel
and last convert Name
to column with remove columns name Lang
by DataFrame.rename_axis
:
df = (pd.concat([df1, pivot])
.sort_index(level=0, sort_remaining=False)
.droplevel(0)
.reset_index()
.rename_axis(None, axis=1))
print (df)
Name C java python
0 Facebook 8.0 1.0 5.0
1 Cristiano NaN NaN 3.0
2 Michael NaN 1.0 2.0
3 Piter 8.0 NaN NaN
4 Google 2.0 6.0 6.0
5 Cristiano NaN NaN 6.0
6 Michael NaN 6.0 NaN
7 Piter 2.0 NaN NaN