I have the following sample input data:
import pandas as pd
df = pd.DataFrame({'col1': ['x', 'y', 'z'], 'col2': [1, 2, 3], 'col3': ['a', 'a', 'b']})
I would like to sort and group by col3
while interleaving the summaries on top of the corresponding group in col1
and get the following output:
col1 col2
0 a 3
1 x 1
2 y 2
3 b 3
4 z 3
I can of course do the part:
df.sort_values(by=['col3']).groupby(by=['col3']).sum()
col2
col3
a 3
b 3
but I am not sure how to interleave the group labels on top of col1
.
CodePudding user response:
What about:
(df.melt(id_vars='col2')
.rename(columns={'value': 'col1'})
.groupby('col1').sum()
.reset_index()
)
output:
col1 col2
0 a 3
1 b 3
2 x 1
3 y 2
4 z 3
CodePudding user response:
Use custom function for top1 row for each group:
def f(x):
return pd.DataFrame({'col1': x.name, 'col2': x['col2'].sum()}, index=[0]).append(x)
df = (df.sort_values(by=['col3'])
.groupby(by=['col3'], group_keys=False)
.apply(f)
.drop('col3', 1)
.reset_index(drop=True))
print (df)
col1 col2
0 a 3
1 x 1
2 y 2
3 b 3
4 z 3
More performant solution is use GroupBy.ngroup
for indices, aggregate sum
amd last join values by concat
with only stable sorting by mergesort
:
df = df.sort_values(by=['col3'])
df1 = df.groupby(by=['col3'])['col2'].sum().rename_axis('col1').reset_index()
df2 = df.set_index(df.groupby(by=['col3']).ngroup())
df = pd.concat([df1, df2]).sort_index(kind='mergesort', ignore_index=True).drop('col3', 1)
print (df)
col1 col2
0 a 3
1 x 1
2 y 2
3 b 3
4 z 3