I have a dataframe with several columns. I want to order by city and keep, for 'city' == 'Buenos Aires', a certain number of rows. And for 'city' == 'Paris', another number of rows. Which is the best way to do it? Here is shown a way to keep the same number of rows for each group. But I want a different number of rows.
city number
0 Buenos Aires 14
1 Paris 23
2 Barcelona 12
3 Buenos Aires 14
4 Buenos Aires 14
... ... ...
CodePudding user response:
Use groupby.apply
with a dictionary of the number of values to keep:
d = {'Buenos Aires': 2, 'Paris': 3}
out = df.groupby('city').apply(lambda g: g.head(d.get(g.name, 0)))
NB. for random rows, use sample
in place of head
.
Alternative with groupby.cumcount
:
d = {'Buenos Aires': 2, 'Paris': 3}
out = (df[df['city'].map(d).lt(df.groupby('city').cumcount())]
.sort_values(by='city')
)