Have got an input dataframe df
like below which holds data with multiple category-main_group-sub_group
. Here as a sample input have provided for single category and single main_group.
So the thing is to reorder rows among sub_group value for each Category-main_group
combination[For eg., Here we got two subgroups KIWI FRUIT
and MANDARIN
for Fruit-CITRUS
combination]. On considering Type column, each sub_group and type
combination need to be placed alternatively as shown in expected output.
Input Dataframe df:
Category main_group sub_group Item Type item_order row
Fruit CITRUS KIWI FRUIT GreenKiwi Loose 1 row_1
Fruit CITRUS KIWI FRUIT GoldKiwi Loose 2 row_1
Fruit CITRUS KIWI FRUIT OtherKiwi Loose 3 row_1
Fruit CITRUS KIWI FRUIT PP GreenKiwi PP 4 row_1
Fruit CITRUS KIWI FRUIT PP GoldKiwi PP 5 row_1
Fruit CITRUS KIWI FRUIT PP OtherKiwi PP 6 row_1
Fruit CITRUS MANDARIN MandarinAfourer Loose 7 row_1
Fruit CITRUS MANDARIN MandarinTangold Loose 8 row_1
Fruit CITRUS MANDARIN PP Mandarin PP 9 row_1
Fruit MANGOES NECTARINES NectaYellow Loose 10 row_1
Fruit MANGOES NECTARINES NectaWhite Loose 11 row_1
Fruit MANGOES PEACHES PeachYellow Loose 12 row_1
Fruit MANGOES PEACHES PeachWhite Loose 13 row_1
Vegg TOM/CAP TOMATO Tomato Truss Loose 14 row_2
Vegg TOM/CAP TOMATO Tomato Roma Loose 15 row_2
Vegg TOM/CAP CUCUMBER Capsicum Mini Loose 16 row_2
Vegg TOM/CAP CUCUMBER Capsicum Red Loose 17 row_2
Vegg ONION ONION Onion Red Loose 18 row_2
Vegg ONION ONION PP Onion PP 19 row_2
Expected Output:
Category main_group sub_group Item Type item_order old_item_order row
Fruit CITRUS KIWI FRUIT GreenKiwi Loose 1 1 row_1
Fruit CITRUS MANDARIN MandarinAfourer Loose 2 7 row_1
Fruit CITRUS KIWI FRUIT GoldKiwi Loose 3 2 row_1
Fruit CITRUS MANDARIN MandarinTangold Loose 4 8 row_1
Fruit CITRUS KIWI FRUIT OtherKiwi Loose 5 3 row_1
Fruit CITRUS KIWI FRUIT PP GreenKiwi PP 6 4 row_1
Fruit CITRUS MANDARIN PP Mandarin PP 7 9 row_1
Fruit CITRUS KIWI FRUIT PP GoldKiwi PP 8 5 row_1
Fruit CITRUS KIWI FRUIT PP OtherKiwi PP 9 6 row_1
Fruit MANGOES NECTARINES NectaYellow Loose 10 10 row_1
Fruit MANGOES PEACHES PeachYellow Loose 11 12 row_1
Fruit MANGOES NECTARINES NectaWhite Loose 12 11 row_1
Fruit MANGOES PEACHES PeachWhite Loose 13 13 row_1
Vegg TOM/CAP TOMATO Tomato Truss Loose 14 14 row_2
Vegg TOM/CAP CUCUMBER Capsicum Mini Loose 15 16 row_2
Vegg TOM/CAP TOMATO Tomato Roma Loose 16 15 row_2
Vegg TOM/CAP CUCUMBER Capsicum Red Loose 17 17 row_2
Vegg ONION ONION Onion Red Loose 18 18 row_2
Vegg ONION ONION PP Onion PP 19 19 row_2
Additional Condition added on edit: this sorting reorder process need to be done only if df['main_group'] == ['CITRUS','MANGOES','TOM/CAP']
, other groups should not be reordered and keep the order as it is(like here 'ONION' need to be excluded).
Kindly help me through this. Thanks in Advance!
CodePudding user response:
You can use groupy.cumcount
to compute a sub-order of the sub_group. Then sort by type and this order.
Example using the efficient np.lexsort
:
out = df.iloc[np.lexsort([df.groupby(['Type', 'sub_group']).cumcount(), df['Type']])]
With pandas' sort_values
:
(df.assign(sub_order=df.groupby(['Type', 'sub_group']).cumcount())
.sort_values(by=['Type', 'sub_order'])
)
Output:
Category main_group sub_group Item Type item_order row
0 Fruit CITRUS KIWI FRUIT GreenKiwi Loose 1 row_1
6 Fruit CITRUS MANDARIN MandarinAfourer Loose 7 row_1
1 Fruit CITRUS KIWI FRUIT GoldKiwi Loose 2 row_1
7 Fruit CITRUS MANDARIN MandarinTangold Loose 8 row_1
2 Fruit CITRUS KIWI FRUIT OtherKiwi Loose 3 row_1
3 Fruit CITRUS KIWI FRUIT PP GreenKiwi PP 4 row_1
8 Fruit CITRUS MANDARIN PP Mandarin PP 9 row_1
4 Fruit CITRUS KIWI FRUIT PP GoldKiwi PP 5 row_1
5 Fruit CITRUS KIWI FRUIT PP OtherKiwi PP 6 row_1
update with more columns:
out = df.iloc[np.lexsort([df.groupby(['Type', 'sub_group']).cumcount(),
df['Type'], df['main_group'], df['Category']])]
Or:
(df.assign(sub_order=df.groupby(['Type', 'sub_group']).cumcount())
.sort_values(by=['Category', 'main_group', 'Type', 'sub_order'])
)