Home > Back-end >  GroupBy with Maximal Group Size
GroupBy with Maximal Group Size

Time:06-29

I have a dataframe df that looks like this:

column_a  ...
1         
1         
1
2
3
3
3
3
3

I now want to group the dataframe based on column_a but the resulting groups should be not of greater size than s. So, for s=2 the groups should be: (1,1), (1), (2), (3,3), (3,3), (3).

I have this working with a simple loop over the grouped dataframe (df.groupby(['column_a'])) and splitting the groups if they are too big but I have the feeling there is a shorter and more elegant way to do this.

Does anyone know a short and elegant method to group with a limited group size?

CodePudding user response:

It seems like you could group by a and the floor div of the groupby cumcount and s.

import pandas as pd
df = pd.DataFrame({'a':[1,1,1,2,3,3,3,3,3]})
s = 2
df.groupby(['a',df.groupby('a').cumcount()//s]).size()

Output

a   
1  0    2
   1    1
2  0    1
3  0    2
   1    2
   2    1
  • Related