I have a dataframe with returns from various investments in %. sort.values does not correctly order my returns. For example I just want to simply see the TEST column returns sorted by lowest to highest or vice versa. Please look at the test output, it is not correct.
df.sort_values('TEST')
gives me an output of returns that are NOT sorted correctly. Sort values code not in correct order
Also I am having an issue where it sorts positive numbers lowest to highest, then half way down starts again for negative numbers lowest to highest.
I just want it to look like following: -3% -1% -0.5% 1% 2% 5%
CodePudding user response:
Go for numpy.lexsort
and boolean indexing :
import numpy as np
arr = np.array([float(x.rstrip("%")) for x in df["TEST"]])
idx = np.lexsort((arr,))
df = df.iloc[idx]
Output :
print(df)
TEST
0 -3%
1 -1%
2 -0.5%
3 1%
4 2%
5 5%
Input used :
df = pd.DataFrame({"TEST": ["1%", "-3%","-0.5%", "-1%", "5%", "2%"]})
TEST
0 1%
1 -3%
2 -0.5%
3 -1%
4 5%
5 2%
CodePudding user response:
The issue is that the lexicographic order of string is different from the natural order (1->10->2 vs 1->2->10).
One option using the key
parameter of sort_values
:
df.sort_values('TEST', key=lambda s: pd.to_numeric(s.str.extract(r'(-?\d \.?\d*)', expand=False)))
Or:
df.sort_values('TEST', key=lambda s: pd.to_numeric(s.str.rstrip('%')))
Output:
TEST
1 -3%
3 -1%
2 -0.5%
0 1%
5 2%
4 5%