I was wondering if there's a way to sort a dataframe by a numeric value but keep just X ocurrencies based on another column? For example, let's say I want to use a dataframe as a catalog and paginate him (so every page would have 5 items). And in every 5 items I need to have at max 2 items of the categorical column.
product seller
10 seller1
9 seller1
8 seller2
7 seller2
6 seller2
5 seller3
And then I would want something like:
product seller
10 seller1
9 seller1
8 seller2
7 seller2
5 seller3
6 seller2
The last 2 lines change place because in the static 1-5 "page1" the seller2 already had 2 items.
CodePudding user response:
Let us try with cumcount
out = df.sort_values(by = 'seller', key = lambda x : df.groupby('seller').cumcount()//2)
Out[145]:
product seller
0 10 seller1
1 9 seller1
2 8 seller2
3 7 seller2
5 5 seller3
4 6 seller2