I have a pandas dataframe
for which I need to sort (by ascending
) the values by two columns with the output being a "middle ground" of the two columns.
An example is shown bellow. When I use sort_values
it sorts by the first columns and considers the second one only for duplicate values. I, however, need to get the row that have the combinaison of lower values for both columns (which is the 3rd one in the ouput bellow).
test = pd.DataFrame({'file':[1,2,3,4,5,6], 'rmse':[66,41,43,39,40,42], 'var':[44,177,201,321,349,379]})
test.sort_values(by=['rmse', 'var'], ascending=[True, True])
Output :
file rmse var
3 4 39 321 <--- First row given by `sort_values`
4 5 40 349
1 2 41 177 <--- Row that I need
5 6 42 379
2 3 43 201
0 1 66 44
I'm not sure how to phrase my question properly in English so please tell me if I need to make my question more clear.
CodePudding user response:
IIUC, let's use rank
, mean
, and argsort
:
test.iloc[test[['var', 'rmse']].rank().mean(axis=1).argsort()]
Output:
file rmse var
1 2 41 177
3 4 39 321
0 1 66 44
4 5 40 349
2 3 43 201
5 6 42 379
Details, rank the values in each column, then average the ranks for each row and sort the mean ranks to determine row order.
CodePudding user response:
I've tried all the methods of df.sort.values
but instead of that you can try a for loop like this :
import pandas as pd
test = pd.DataFrame({'file':[1,2,3,4,5,6], 'rmse':[66,41,43,39,40,42], 'var':[44,177,201,321,349,379]})
for i in test:
test[i]=sorted(test[i])
print(test)
Output :
file rmse var
0 1 39 44
1 2 40 177
2 3 41 201
3 4 42 321
4 5 43 349
5 6 66 379