How to sort by multiple columns for all values and not just duplicates

I have a pandas dataframe for which I need to sort (by ascending) the values by two columns with the output being a "middle ground" of the two columns.

An example is shown bellow. When I use sort_values it sorts by the first columns and considers the second one only for duplicate values. I, however, need to get the row that have the combinaison of lower values for both columns (which is the 3rd one in the ouput bellow).

test = pd.DataFrame({'file':[1,2,3,4,5,6], 'rmse':[66,41,43,39,40,42], 'var':[44,177,201,321,349,379]})
test.sort_values(by=['rmse', 'var'], ascending=[True, True])

Output :

    file    rmse    var
3   4       39      321  <--- First row given by `sort_values`
4   5       40      349
1   2       41      177  <--- Row that I need
5   6       42      379
2   3       43      201
0   1       66      44

I'm not sure how to phrase my question properly in English so please tell me if I need to make my question more clear.

CodePudding user response：

IIUC, let's use rank, mean, and argsort:

test.iloc[test[['var', 'rmse']].rank().mean(axis=1).argsort()]

Output:

   file  rmse  var
1     2    41  177
3     4    39  321
0     1    66   44
4     5    40  349
2     3    43  201
5     6    42  379

Details, rank the values in each column, then average the ranks for each row and sort the mean ranks to determine row order.

CodePudding user response：

I've tried all the methods of df.sort.values but instead of that you can try a for loop like this :

import pandas as pd

test = pd.DataFrame({'file':[1,2,3,4,5,6], 'rmse':[66,41,43,39,40,42], 'var':[44,177,201,321,349,379]})

for i in test:
    test[i]=sorted(test[i])
    print(test)

Output :

      file  rmse  var
  0     1    39   44
  1     2    40  177
  2     3    41  201
  3     4    42  321
  4     5    43  349
  5     6    66  379