I have a dataframe df
that looks like this:
column_a ...
1
1
1
2
3
3
3
3
3
I now want to group the dataframe based on column_a
but the resulting groups should be not of greater size than s
.
So, for s=2
the groups should be:
(1,1), (1), (2), (3,3), (3,3), (3)
.
I have this working with a simple loop over the grouped dataframe (df.groupby(['column_a'])
) and splitting the groups if they are too big but I have the feeling there is a shorter and more elegant way to do this.
Does anyone know a short and elegant method to group with a limited group size?
CodePudding user response:
It seems like you could group by a
and the floor div of the groupby cumcount and s.
import pandas as pd
df = pd.DataFrame({'a':[1,1,1,2,3,3,3,3,3]})
s = 2
df.groupby(['a',df.groupby('a').cumcount()//s]).size()
Output
a
1 0 2
1 1
2 0 1
3 0 2
1 2
2 1