Home > Mobile >  Sorting a multiindex dataframe by column without losing the order of indexes
Sorting a multiindex dataframe by column without losing the order of indexes

Time:11-26

I have a df that has this structure:

1_product_id        2_product_id        1_qty_sold              2_qty_sold              times_sold
5584                4384                159.00                  653.00                  153
7889                2970                104.00                  497.00                  102
5024                2970                89.00                   413.00                  87
2990                8310                71.00                   283.00                  71
2990                4384                71.00                   282.00                  68
2990                2970                62.00                   240.00                  58
5584                8310                56.00                   208.00                  54

I am trying to make it look like this:

1_product_id        2_product_id        1_qty_sold              2_qty_sold              times_sold
5584                4384                159.00                  653.00                  153
                    8310                56.00                   208.00                  54
7889                2970                104.00                  497.00                  102
5024                2970                89.00                   413.00                  87
2990                8310                71.00                   283.00                  71
                    4384                71.00                   282.00                  68
                    2970                62.00                   240.00                  58

Where it is sorted by times_sold and grouped by 1_product_id and 2_product_id. I tried:

df_out.groupby(['1_product_id','2_product_id']).sum() \ 
                                               .sort_values('times_sold', ascending = False)

But this messes up the 1_prodct_id index:

                                            1_qty_despatched        2_qty_despatched    times_sold
1_product_id        2_product_id            
5584                4384                    159.00                  653.00              153
7889                2970                    104.00                  497.00              102
5024                2970                    89.00                   413.00              87
2990                8310                    71.00                   283.00              71
                    4384                    71.00                   282.00              68
                    2970                    62.00                   240.00              58
5584                8310                    56.00                   208.00              54

How can I sort it by times_sold and group by 1_product_id & 2_product_id without losing the desired structure? I checked this answer but it did not help me.

Edit

I tried:

df_out.groupby(['1_product_id','2_product_id']).sum() \ 
                                               .sort_values('times_sold', ascending = False) \
                                               .sort_index(0, ascending = False)

But the index is not in the order that I would like.

                                        1_qty_sold              2_qty_sold              times_sold
1_product_id        2_product_id            
7889                2970                104.00                  497.00                  102 # this should be the 2nd index
5584                4384                159.00                  653.00                  153 # this should be the 1st index
                    8310                56.00                   208.00                  54 
5024                2970                89.00                   413.00                  87 # this should be the 3rd index
2990                4384                71.00                   282.00                  68 # this should be the 4th index
                    2970                62.00                   240.00                  58
                    8310                71.00                   283.00                  71

After using @jezrael solution I get:

                                            times_sold  1_qty_sold  2_qty_sold
1_product_id        2_product_id            
5584                4384                    153         159.00      653.00
7889                2970                    102         104.00      497.00
5024                2970                    87          89.00       413.00
2990                8310                    71          71.00       283.00
                    4384                    68          71.00       282.00
                    2970                    58          62.00       240.00
5584                8310                    54          56.00       208.00

While I am trying to make it look like this:

# note that the last row is now the second row and indexes `1_product_id` is unique.

                                            times_sold  1_qty_sold  2_qty_sold
1_product_id        2_product_id            
5584                4384                    153         159.00      653.00
                    8310                    54          56.00       208.00 
7889                2970                    102         104.00      497.00
5024                2970                    87          89.00       413.00
2990                8310                    71          71.00       283.00
                    4384                    68          71.00       282.00
                    2970                    58          62.00       240.00

CodePudding user response:

For prevent sorting in groupby add sort=False with dict for original order for mapping:

print (df_out)
   1_product_id  2_product_id  1_qty_sold  2_qty_sold  times_sold
0          5584          4384       159.0       653.0          10 <- changed for test sort
1          7889          2970       104.0       497.0         102
2          5024          2970        89.0       413.0          87
3          2990          8310        71.0       283.0          71
4          2990          4384        71.0       282.0          68
5          2990          2970        62.0       240.0          58
6          5584          8310        56.0       208.0          54


uniq = {v: k for k, v in dict(enumerate(df_out['1_product_id'].unique())).items()}
print(uniq)
{5584: 0, 7889: 1, 5024: 2, 2990: 3}

df = (df_out.groupby(['1_product_id','2_product_id'], sort=False)
            .sum()
            .sort_values(['1_product_id', 'times_sold'],
                          key=lambda x: x.map(uniq).fillna(x), 
                          ascending=[True, False])

            )

print (df)
                           1_qty_sold  2_qty_sold  times_sold
1_product_id 2_product_id                                    
5584         8310                56.0       208.0          54
             4384               159.0       653.0          10
7889         2970               104.0       497.0         102
5024         2970                89.0       413.0          87
2990         8310                71.0       283.0          71
             4384                71.0       282.0          68
             2970                62.0       240.0          58

Alternative with another groupby:

df = (df_out.groupby(['1_product_id','2_product_id'], sort=False)
            .sum()
            .groupby('1_product_id', sort=False, group_keys=False)
            .apply(lambda x: x.sort_values('times_sold', ascending=False))
            )

print (df)
                           1_qty_sold  2_qty_sold  times_sold
1_product_id 2_product_id                                    
5584         8310                56.0       208.0          54
             4384               159.0       653.0          10
7889         2970               104.0       497.0         102
5024         2970                89.0       413.0          87
2990         8310                71.0       283.0          71
             4384                71.0       282.0          68
             2970                62.0       240.0          58

CodePudding user response:

After you are done with your processing you can just add

df_out.sort_index(0)

This will fix your first index sorting problem. Other ones will be already sorted based on times sold.

  • Related