How to add interleaving rows as result of sort / groups?-CodePudding

I have the following sample input data:

import pandas as pd
df = pd.DataFrame({'col1': ['x', 'y', 'z'], 'col2': [1, 2, 3], 'col3': ['a', 'a', 'b']})

I would like to sort and group by col3 while interleaving the summaries on top of the corresponding group in col1 and get the following output:

    col1  col2
0     a      3
1     x      1
2     y      2
3     b      3
4     z      3

I can of course do the part:

df.sort_values(by=['col3']).groupby(by=['col3']).sum()

      col2
col3      
  a      3
  b      3

but I am not sure how to interleave the group labels on top of col1.

CodePudding user response：

What about:

(df.melt(id_vars='col2')
   .rename(columns={'value': 'col1'})
   .groupby('col1').sum()
   .reset_index()
)

output:

  col1  col2
0    a     3
1    b     3
2    x     1
3    y     2
4    z     3

CodePudding user response：

Use custom function for top1 row for each group:

def f(x):
    return pd.DataFrame({'col1': x.name, 'col2': x['col2'].sum()}, index=[0]).append(x)
    
df = (df.sort_values(by=['col3'])
        .groupby(by=['col3'], group_keys=False)
        .apply(f)
        .drop('col3', 1)
        .reset_index(drop=True))
print (df)
  col1  col2
0    a     3
1    x     1
2    y     2
3    b     3
4    z     3

More performant solution is use GroupBy.ngroup for indices, aggregate sum amd last join values by concat with only stable sorting by mergesort:

df = df.sort_values(by=['col3'])

df1 = df.groupby(by=['col3'])['col2'].sum().rename_axis('col1').reset_index()
df2 = df.set_index(df.groupby(by=['col3']).ngroup())

df = pd.concat([df1, df2]).sort_index(kind='mergesort', ignore_index=True).drop('col3', 1)
print (df)
  col1  col2
0    a     3
1    x     1
2    y     2
3    b     3
4    z     3