I am very new to pandas
and Python in general.
I have a dataframe with many columns, one of them is score_init
:
----------
|score_init|
----------
| 38.27|
| 39.27|
| 29.16|
| 32.60|
| 40.45|
| 19.49|
| 48.27|
| 14.25|
| 32.64|
| 32.97|
----------
For this column I need to calculate percentiles from 0 to 100 and based on these values create an additional column (score_new
) where all score_init
values that lie between 99th and 100th percentiles (99% < score_init <= 100%
) are assigned 1, between 98th and 99th percentiles (98% < score <= 99%
) - assigned 2, between 97th and 98th percentiles (97% < score <= 98%
) - assigned 3, and so on.
For values greater than the 100th percentile, also assign 1; for values less than 0 percentile, assign 100.
Desired output:
---------- ----------
|score_init| score_new|
---------- ----------
| 38.27| 34|
| 39.27| 23|
| 29.16| 78|
| 32.60| 67|
| 40.45| 12|
| 19.49| 89|
| 48.27| 1|
| 14.25| 100|
| 32.64| 56|
| 32.97| 45|
---------- ----------
CodePudding user response:
Use qcut
:
df['scores_new'] = 100 - pd.qcut(df['score_init'], 100).cat.codes
output:
score_init scores_new
0 38.27 34
1 39.27 23
2 29.16 78
3 32.60 67
4 40.45 12
5 19.49 89
6 48.27 1
7 14.25 100
8 32.64 56
9 32.97 45
You can generalize to any quantile, example for quartiles:
N = 4
df['scores_new'] = N - pd.qcut(df['score_init'], N).cat.codes