Say I have a dataframe,
dict_ = {
'Query' : ['apple', 'banana', 'mango', 'bat', 'cat', 'rat', 'lion', 'potato', 'london', 'new jersey'],
'Category': ['fruits', 'fruits', 'fruits', 'animal', 'animal', 'animal', 'animal', 'veggie', 'place', 'place'],
}
df = pd.DataFrame(dict_)
After replicating the groups based on some logic, I see the resultant dataframe has been sorted by the index value.
rep_val= df.groupby('Category').size().max()
df.groupby('Category').apply(lambda d: pd.concat(([d]*math.ceil(rep_val/d.shape[0]))).head(rep_val)).reset_index(drop=True)
Query Category
0 bat animal
1 cat animal
2 rat animal
3 lion animal
4 apple fruits
5 banana fruits
6 mango fruits
7 apple fruits
8 london place
9 new jersey place
10 london place
11 new jersey place
12 potato veggie
13 potato veggie
14 potato veggie
15 potato veggie
However expected was the group appearing first should appear first in the dataframe as well, so my categories would be fruits, animal, veggie, place
and their corresponding values
Expected output :
Query Category
0 apple fruits
1 banana fruits
2 mango fruits
3 apple fruits
4 bat animal
5 cat animal
6 rat animal
7 lion animal
8 potato veggie
9 potato veggie
10 potato veggie
11 potato veggie
12 london place
13 new jersey place
14 london place
15 new jersey place
CodePudding user response:
Set sort=False
for the groupby
method
CODE
rep_val = df.groupby('Category', sort=False).size().max()
df = df.groupby('Category', sort=False).apply(lambda d: pd.concat(([d] * math.ceil(rep_val / d.shape[0]))).head(rep_val)).reset_index(drop=True)
OUTPUT
Query Category
0 apple fruits
1 banana fruits
2 mango fruits
3 apple fruits
4 bat animal
5 cat animal
6 rat animal
7 lion animal
8 potato veggie
9 potato veggie
10 potato veggie
11 potato veggie
12 london place
13 new jersey place
14 london place
15 new jersey place