I have a small dataframe on synthetic scores and the difference between the two:
score_1 score_2 diff
10 15 5
7 9 2
13 23 10
12 4 -8
...
There are quite a few rows so I want to order the dataframe by the largest difference between score_1 and score_2. I'm currently using:
df.sort_values(by=['diff'], ascending=False)
But this returns:
score_1 score_2 diff
13 23 10
10 15 5
7 9 2
12 4 -8
Whereas what I'm looking for is:
score_1 score_2 diff
13 23 10
12 4 -8
10 15 5
7 9 2
For it to return the biggest difference in descending order regardless of whether it is a negative or positive difference
CodePudding user response:
Use:
df = df.sort_values(by=['diff'], ascending=False, key=lambda x: x.abs())
print (df)
score_1 score_2 diff
2 13 23 10
3 12 4 -8
0 10 15 5
1 7 9 2
CodePudding user response:
Use:
df2 = df.sort_values(by='diff', ascending=False, key=abs)
or, alternatively:
df2 = df.loc[df['diff'].abs().sort_values(ascending=False).index]
output:
score_1 score_2 diff
2 13 23 10
3 12 4 -8
0 10 15 5
1 7 9 2
CodePudding user response:
The two other answers are more pertinent but I give you another way with rank
:
>>> df.iloc[df['diff'].abs().rank(method='dense', ascending=False).sub(1)]
score_1 score_2 diff
2 13 23 10
3 12 4 -8
0 10 15 5
1 7 9 2