Home > database >  keep a certain number of rows from a specific group in pandas
keep a certain number of rows from a specific group in pandas

Time:01-03

I have a dataframe with several columns. I want to order by city and keep, for 'city' == 'Buenos Aires', a certain number of rows. And for 'city' == 'Paris', another number of rows. Which is the best way to do it? Here is shown a way to keep the same number of rows for each group. But I want a different number of rows.

    city            number
0   Buenos Aires    14
1   Paris           23
2   Barcelona       12
3   Buenos Aires    14
4   Buenos Aires    14
... ...             ...

CodePudding user response:

Use groupby.apply with a dictionary of the number of values to keep:

d = {'Buenos Aires': 2, 'Paris': 3}

out = df.groupby('city').apply(lambda g: g.head(d.get(g.name, 0)))

NB. for random rows, use sample in place of head.

Alternative with groupby.cumcount:

d = {'Buenos Aires': 2, 'Paris': 3}

out = (df[df['city'].map(d).lt(df.groupby('city').cumcount())]
       .sort_values(by='city')
      )
  • Related