I have a df
that has this structure:
1_product_id 2_product_id 1_qty_sold 2_qty_sold times_sold
5584 4384 159.00 653.00 153
7889 2970 104.00 497.00 102
5024 2970 89.00 413.00 87
2990 8310 71.00 283.00 71
2990 4384 71.00 282.00 68
2990 2970 62.00 240.00 58
5584 8310 56.00 208.00 54
I am trying to make it look like this:
1_product_id 2_product_id 1_qty_sold 2_qty_sold times_sold
5584 4384 159.00 653.00 153
8310 56.00 208.00 54
7889 2970 104.00 497.00 102
5024 2970 89.00 413.00 87
2990 8310 71.00 283.00 71
4384 71.00 282.00 68
2970 62.00 240.00 58
Where it is sorted by times_sold
and grouped by 1_product_id
and 2_product_id
. I tried:
df_out.groupby(['1_product_id','2_product_id']).sum() \
.sort_values('times_sold', ascending = False)
But this messes up the 1_prodct_id
index:
1_qty_despatched 2_qty_despatched times_sold
1_product_id 2_product_id
5584 4384 159.00 653.00 153
7889 2970 104.00 497.00 102
5024 2970 89.00 413.00 87
2990 8310 71.00 283.00 71
4384 71.00 282.00 68
2970 62.00 240.00 58
5584 8310 56.00 208.00 54
How can I sort it by times_sold
and group by 1_product_id & 2_product_id
without losing the desired structure? I checked this answer but it did not help me.
Edit
I tried:
df_out.groupby(['1_product_id','2_product_id']).sum() \
.sort_values('times_sold', ascending = False) \
.sort_index(0, ascending = False)
But the index is not in the order that I would like.
1_qty_sold 2_qty_sold times_sold
1_product_id 2_product_id
7889 2970 104.00 497.00 102 # this should be the 2nd index
5584 4384 159.00 653.00 153 # this should be the 1st index
8310 56.00 208.00 54
5024 2970 89.00 413.00 87 # this should be the 3rd index
2990 4384 71.00 282.00 68 # this should be the 4th index
2970 62.00 240.00 58
8310 71.00 283.00 71
After using @jezrael solution I get:
times_sold 1_qty_sold 2_qty_sold
1_product_id 2_product_id
5584 4384 153 159.00 653.00
7889 2970 102 104.00 497.00
5024 2970 87 89.00 413.00
2990 8310 71 71.00 283.00
4384 68 71.00 282.00
2970 58 62.00 240.00
5584 8310 54 56.00 208.00
While I am trying to make it look like this:
# note that the last row is now the second row and indexes `1_product_id` is unique.
times_sold 1_qty_sold 2_qty_sold
1_product_id 2_product_id
5584 4384 153 159.00 653.00
8310 54 56.00 208.00
7889 2970 102 104.00 497.00
5024 2970 87 89.00 413.00
2990 8310 71 71.00 283.00
4384 68 71.00 282.00
2970 58 62.00 240.00
CodePudding user response:
For prevent sorting in groupby
add sort=False
with dict for original order for mapping:
print (df_out)
1_product_id 2_product_id 1_qty_sold 2_qty_sold times_sold
0 5584 4384 159.0 653.0 10 <- changed for test sort
1 7889 2970 104.0 497.0 102
2 5024 2970 89.0 413.0 87
3 2990 8310 71.0 283.0 71
4 2990 4384 71.0 282.0 68
5 2990 2970 62.0 240.0 58
6 5584 8310 56.0 208.0 54
uniq = {v: k for k, v in dict(enumerate(df_out['1_product_id'].unique())).items()}
print(uniq)
{5584: 0, 7889: 1, 5024: 2, 2990: 3}
df = (df_out.groupby(['1_product_id','2_product_id'], sort=False)
.sum()
.sort_values(['1_product_id', 'times_sold'],
key=lambda x: x.map(uniq).fillna(x),
ascending=[True, False])
)
print (df)
1_qty_sold 2_qty_sold times_sold
1_product_id 2_product_id
5584 8310 56.0 208.0 54
4384 159.0 653.0 10
7889 2970 104.0 497.0 102
5024 2970 89.0 413.0 87
2990 8310 71.0 283.0 71
4384 71.0 282.0 68
2970 62.0 240.0 58
Alternative with another groupby
:
df = (df_out.groupby(['1_product_id','2_product_id'], sort=False)
.sum()
.groupby('1_product_id', sort=False, group_keys=False)
.apply(lambda x: x.sort_values('times_sold', ascending=False))
)
print (df)
1_qty_sold 2_qty_sold times_sold
1_product_id 2_product_id
5584 8310 56.0 208.0 54
4384 159.0 653.0 10
7889 2970 104.0 497.0 102
5024 2970 89.0 413.0 87
2990 8310 71.0 283.0 71
4384 71.0 282.0 68
2970 62.0 240.0 58
CodePudding user response:
After you are done with your processing you can just add
df_out.sort_index(0)
This will fix your first index sorting problem. Other ones will be already sorted based on times sold.