I have a pandas dataframe (sample) shown below:
ColA MODEL_SCORE
B 300
A 400
L 500
K 600
C 400
...
and I am using np.select to get my expected output, as you can see I have to write down the conditions manually but I have my values present in the list. Please let me know, how I can utilise this list to avoid manually writing the conditions. Thanks
l = [443.42128478674164,
488.37523204592253,
518.0823073999817,
541.0359169945577,
555.8687207507057,
567.4177820456491,
579.8827874601552,
589.7055254683078,
599.4173064672602,
606.7602443130553,
614.6608818995334,
624.0346335587483,
632.7952850129415,
641.7055745252072,
650.3578400196975,
660.2332325374314,
670.7207392073833,
685.3945990076263,
705.084106536755,
788.1550777011911]
conditions =
[recent['MODEL_SCORE']<= 443.421285,
recent['MODEL_SCORE'] <= 488.375232,
recent['MODEL_SCORE'] <=518.082307,
recent['MODEL_SCORE'] <=541.035917,
recent['MODEL_SCORE'] <=555.868721,
recent['MODEL_SCORE'] <=567.417782,
recent['MODEL_SCORE'] <=579.882787,
recent['MODEL_SCORE'] <=589.705525,
recent['MODEL_SCORE'] <=599.417306,
recent['MODEL_SCORE'] <=606.760244,
recent['MODEL_SCORE'] <=614.660882,
recent['MODEL_SCORE'] <=624.034634,
recent['MODEL_SCORE'] <=632.795285,
recent['MODEL_SCORE'] <=641.705575,
recent['MODEL_SCORE'] <=650.357840,
recent['MODEL_SCORE'] <=660.233233,
recent['MODEL_SCORE'] <=670.720739,
recent['MODEL_SCORE'] <=685.394599,
recent['MODEL_SCORE'] <=705.084107,
recent['MODEL_SCORE'] <=788.155078]
choices = list(range(0,20))
recent['ranks'] = np.select(conditions,choices,default=99)
Expected output
ColA MODEL_SCORE ranks
B 300 0
A 400 0
L 500 2
K 600 9
C 400 0
...
CodePudding user response:
Use cut
with labels=False
, replace missing values to 99
:
#add first group starting by 0
l = [0] l
df['ranks'] = (pd.cut(df['MODEL_SCORE'], bins=l, labels=False, right=False)
.fillna(99)
.astype(int))
print (df)
ColA MODEL_SCORE ranks
0 B 300 0
1 A 400 0
2 L 500 2
3 K 600 9
4 C 40000 99