Say I have a dataframe, df
, as below:
Client Number (#) Volume ($) Num. z-Score Vol. z-Score
0 ABC 63 131.22 1.17 0.68
1 DEF 44 98.71 2.68 1.35
2 JKL 17 64.15 0.45 0.57
3 PQR 75 180.47 0.88 1.43
4 XYZ 28 75.93 0.23 3.96
I would like to sort it such that the maximum values of either of the last two columns appear as the first row. As z-Score tracks deviation from the mean with respect to SD, I am looking for the greatest deviation for either of my two measures (number, volume) and would rather not prioritise them.
For instance, the current method I am using: df.sort_values(['Num. z-Score','Vol. z-Score'], ascending=False)
, is discriminatory as it sorts by 'Num. z-Score'
first and would ONLY look at 'Vol. z-Score'
if there were any equal values.
How can I instead sort the column in such a way that the final result looks as below:
Client Number (#) Volume ($) Num. z-Score Vol. z-Score
4 XYZ 28 75.93 0.23 3.96
1 DEF 44 98.71 2.68 1.35
0 ABC 63 131.22 1.17 0.68
3 PQR 75 180.47 0.88 1.03
2 JKL 17 64.15 0.45 0.57
Any help would be greatly appreciated :)
CodePudding user response:
Looks like you're trying to find the max value between 'Num. z-Score' and 'Vol. z-Score'.
df['max_value'] = df[['Num. z-Score','Vol. z-Score']].max(axis=1)
df.sort_values(['max_value'], ascending=False)
CodePudding user response:
Use DataFrame.sort_index
with maximum of both columns:
df = df.sort_index(key=df[['Vol. z-Score','Num. z-Score']].max(axis=1).get, ascending=False)
print (df)
Client Number (#) Volume ($) Num. z-Score Vol. z-Score
4 XYZ 28 75.93 0.23 3.96
1 DEF 44 98.71 2.68 1.35
3 PQR 75 180.47 0.88 1.43
0 ABC 63 131.22 1.17 0.68
2 JKL 17 64.15 0.45 0.57
Alternative solution with Series.argsort
and negative values for descending order:
df = df.iloc[(-df[['Vol. z-Score','Num. z-Score']].max(axis=1)).argsort()]
print (df)
Client Number (#) Volume ($) Num. z-Score Vol. z-Score
4 XYZ 28 75.93 0.23 3.96
1 DEF 44 98.71 2.68 1.35
3 PQR 75 180.47 0.88 1.43
0 ABC 63 131.22 1.17 0.68
2 JKL 17 64.15 0.45 0.57