Suppose I have a dataframe like
a b
i1 t1
i1 t2
i2 t3
i2 t1
i2 t3
i3 t2
I want to group df by "a" and then select 2 top largest group. I specifically want the number of resulting rows
a b
i2 t3
i2 t1
i2 t3
i1 t1
i1 t2
I tried:
df.groupby("a").head(2)
But it seems select two rows of each group
CodePudding user response:
Example
data = {'a': {0: 'i1', 1: 'i1', 2: 'i2', 3: 'i2', 4: 'i2', 5: 'i3'},
'b': {0: 't1', 1: 't2', 2: 't3', 3: 't1', 4: 't3', 5: 't2'}}
df = pd.DataFrame(data)
Code
lst = df['a'].value_counts()[:2].index
out = df[df['a'].isin(lst)]
out
a b
0 i1 t1
1 i1 t2
2 i2 t3
3 i2 t1
4 i2 t3
if you want sort by quantity. use following code
lst = df['a'].value_counts()[:2].index
m = pd.Series(range(0, 2), index=lst)
out = df[df['a'].isin(lst)].sort_values('a', key=lambda x: m[x])
out
a b
2 i2 t3
3 i2 t1
4 i2 t3
0 i1 t1
1 i1 t2