Home > Net >  Pandas: Select top n groups
Pandas: Select top n groups

Time:12-30

Suppose I have a dataframe like

a   b
i1  t1
i1  t2
i2  t3
i2  t1
i2  t3
i3  t2

I want to group df by "a" and then select 2 top largest group. I specifically want the number of resulting rows

a   b
i2  t3
i2  t1
i2  t3
i1  t1
i1  t2

I tried:

df.groupby("a").head(2)   

But it seems select two rows of each group

CodePudding user response:

Example

data = {'a': {0: 'i1', 1: 'i1', 2: 'i2', 3: 'i2', 4: 'i2', 5: 'i3'},
        'b': {0: 't1', 1: 't2', 2: 't3', 3: 't1', 4: 't3', 5: 't2'}}
df = pd.DataFrame(data)

Code

lst = df['a'].value_counts()[:2].index
out = df[df['a'].isin(lst)]

out

     a  b
0   i1  t1
1   i1  t2
2   i2  t3
3   i2  t1
4   i2  t3

if you want sort by quantity. use following code

lst = df['a'].value_counts()[:2].index
m = pd.Series(range(0, 2), index=lst)
out = df[df['a'].isin(lst)].sort_values('a', key=lambda x: m[x])

out

    a   b
2   i2  t3
3   i2  t1
4   i2  t3
0   i1  t1
1   i1  t2
  • Related