Home > Software engineering >  How to sort pandas DataFrame by two columns collectively, rather than one then the other?
How to sort pandas DataFrame by two columns collectively, rather than one then the other?

Time:10-31

Say I have a dataframe, df, as below:

     Client    Number (#)    Volume ($)    Num. z-Score    Vol. z-Score

0    ABC       63            131.22        1.17            0.68
1    DEF       44            98.71         2.68            1.35
2    JKL       17            64.15         0.45            0.57
3    PQR       75            180.47        0.88            1.43
4    XYZ       28            75.93         0.23            3.96

I would like to sort it such that the maximum values of either of the last two columns appear as the first row. As z-Score tracks deviation from the mean with respect to SD, I am looking for the greatest deviation for either of my two measures (number, volume) and would rather not prioritise them.

For instance, the current method I am using: df.sort_values(['Num. z-Score','Vol. z-Score'], ascending=False) , is discriminatory as it sorts by 'Num. z-Score' first and would ONLY look at 'Vol. z-Score' if there were any equal values.

How can I instead sort the column in such a way that the final result looks as below:

     Client    Number (#)    Volume ($)    Num. z-Score    Vol. z-Score

4    XYZ       28            75.93         0.23            3.96
1    DEF       44            98.71         2.68            1.35
0    ABC       63            131.22        1.17            0.68
3    PQR       75            180.47        0.88            1.03
2    JKL       17            64.15         0.45            0.57

Any help would be greatly appreciated :)

CodePudding user response:

Looks like you're trying to find the max value between 'Num. z-Score' and 'Vol. z-Score'.

df['max_value'] = df[['Num. z-Score','Vol. z-Score']].max(axis=1)
df.sort_values(['max_value'], ascending=False)

CodePudding user response:

Use DataFrame.sort_index with maximum of both columns:

df = df.sort_index(key=df[['Vol. z-Score','Num. z-Score']].max(axis=1).get, ascending=False)
print (df)
  Client  Number (#)  Volume ($)  Num. z-Score  Vol. z-Score
4    XYZ          28       75.93          0.23          3.96
1    DEF          44       98.71          2.68          1.35
3    PQR          75      180.47          0.88          1.43
0    ABC          63      131.22          1.17          0.68
2    JKL          17       64.15          0.45          0.57

Alternative solution with Series.argsort and negative values for descending order:

df = df.iloc[(-df[['Vol. z-Score','Num. z-Score']].max(axis=1)).argsort()]
print (df)
  Client  Number (#)  Volume ($)  Num. z-Score  Vol. z-Score
4    XYZ          28       75.93          0.23          3.96
1    DEF          44       98.71          2.68          1.35
3    PQR          75      180.47          0.88          1.43
0    ABC          63      131.22          1.17          0.68
2    JKL          17       64.15          0.45          0.57
  • Related