Home > Blockchain >  Create new column and assign values from 1 to 100 based on percentiles
Create new column and assign values from 1 to 100 based on percentiles

Time:08-30

I am very new to pandas and Python in general.
I have a dataframe with many columns, one of them is score_init:

 ---------- 
|score_init|
 ---------- 
|     38.27|
|     39.27|
|     29.16|
|     32.60|
|     40.45|
|     19.49|
|     48.27|
|     14.25|
|     32.64|
|     32.97|
 ---------- 

For this column I need to calculate percentiles from 0 to 100 and based on these values create an additional column (score_new) where all score_init values that lie between 99th and 100th percentiles (99% < score_init <= 100%) are assigned 1, between 98th and 99th percentiles (98% < score <= 99%) - assigned 2, between 97th and 98th percentiles (97% < score <= 98%) - assigned 3, and so on.
For values greater than the 100th percentile, also assign 1; for values less than 0 percentile, assign 100.
Desired output:

 ---------- ---------- 
|score_init| score_new|
 ---------- ---------- 
|     38.27|        34|
|     39.27|        23|
|     29.16|        78|
|     32.60|        67|
|     40.45|        12|
|     19.49|        89|
|     48.27|         1|
|     14.25|       100|
|     32.64|        56|
|     32.97|        45|
 ---------- ---------- 

CodePudding user response:

Use qcut:

df['scores_new'] = 100 - pd.qcut(df['score_init'], 100).cat.codes

output:

   score_init  scores_new
0       38.27          34
1       39.27          23
2       29.16          78
3       32.60          67
4       40.45          12
5       19.49          89
6       48.27           1
7       14.25         100
8       32.64          56
9       32.97          45

You can generalize to any quantile, example for quartiles:

N = 4
df['scores_new'] = N - pd.qcut(df['score_init'], N).cat.codes
  • Related