Is there a way to rank values in a dataframe but considering a tolerance?
Say I have the following values
ex = pd.Series([16.52,19.95,16.15,22.77,20.53,19.96])
and if I ran rank:
ex.rank(method='average')
0 2.0
1 3.0
2 1.0
3 6.0
4 5.0
5 4.0
dtype: float64
But what I'd like as a result would be (with a tolereance of 0.01):
0 2.0
1 3.5
2 1.0
3 6.0
4 5.0
5 3.5
Any way to define this tolerance?
Thanks
CodePudding user response:
This function may works:
def rank_with_tolerance(sr, tolerance=0.01 1e-10, method='average'):
vals = pd.Series(sr.unique()).sort_values()
vals.index = vals
vals = vals.mask(vals - vals.shift(1) <= tolerance, vals.shift(1))
return sr.map(tvals).fillna(sr).rank(method=method)
It works for your given input:
ex = pd.Series([16.52,19.95,16.15,22.77,20.53,19.96])
rank_with_tolerance(ex, tolerance=0.01 1e-10, method='average')
# result:
0 2.0
1 3.5
2 1.0
3 6.0
4 5.0
5 3.5
dtype: float64
And with more complex sets it seems to work too:
ex = pd.Series([16.52,19.95,19.96, 19.95, 19.97, 19.97, 19.98])
rank_with_tolerance(ex, tolerance=0.01 1e-10, method='average')
# result:
0 1.0
1 3.0
2 3.0
3 3.0
4 5.5
5 5.5
6 7.0
dtype: float64
CodePudding user response:
You could do some sort of min-max scaling i.e.
ex = pd.Series([16.52,19.95,16.15,22.77,20.53,19.96])
# You scale the values to be between 0 and 1
ex_scaled = (ex - min(ex)) / (max(ex) - min(ex))
# You put them on a scale from 1 to the length of your series
result = ex_scaled * len(ex) 1
# result
0 1.335347
1 4.444109
2 1.000000
3 7.000000
4 4.969789
5 4.453172
That way you are still ranking, but values closer to each other have ranks close to each other
CodePudding user response:
You can sort the values, merge the close ones and rank on that:
s = ex.drop_duplicates().sort_values()
mapper = (s.groupby(s.diff().abs().gt(0.011).cumsum(), sort=False)
.transform('mean')
.reindex_like(ex)
)
out = mapper.rank(method='average')
N.B. I used 0.011
as threshold as floating point arithmetics does not always enable enough precision to detect a value clode to the threshold
output:
0 2.0
1 3.5
2 1.0
3 6.0
4 5.0
5 3.5
dtype: float64
intermediate mapper
:
0 16.520
1 19.955
2 16.150
3 22.770
4 20.530
5 19.955
dtype: float64