Home > Mobile >  Assign conditions from a list np.select and create a new column (pandas)
Assign conditions from a list np.select and create a new column (pandas)

Time:11-03

I have a pandas dataframe (sample) shown below:

ColA   MODEL_SCORE
B      300
A      400
L      500
K      600
C      400

...

and I am using np.select to get my expected output, as you can see I have to write down the conditions manually but I have my values present in the list. Please let me know, how I can utilise this list to avoid manually writing the conditions. Thanks

l = [443.42128478674164,
 488.37523204592253,
 518.0823073999817,
 541.0359169945577,
 555.8687207507057,
 567.4177820456491,
 579.8827874601552,
 589.7055254683078,
 599.4173064672602,
 606.7602443130553,
 614.6608818995334,
 624.0346335587483,
 632.7952850129415,
 641.7055745252072,
 650.3578400196975,
 660.2332325374314,
 670.7207392073833,
 685.3945990076263,
 705.084106536755,
 788.1550777011911]
conditions = 
[recent['MODEL_SCORE']<= 443.421285,
recent['MODEL_SCORE'] <= 488.375232,
recent['MODEL_SCORE'] <=518.082307,
recent['MODEL_SCORE'] <=541.035917,
recent['MODEL_SCORE'] <=555.868721,
recent['MODEL_SCORE'] <=567.417782,
recent['MODEL_SCORE'] <=579.882787,
recent['MODEL_SCORE'] <=589.705525,
recent['MODEL_SCORE'] <=599.417306,
recent['MODEL_SCORE'] <=606.760244,
recent['MODEL_SCORE'] <=614.660882,
recent['MODEL_SCORE'] <=624.034634,
recent['MODEL_SCORE'] <=632.795285,
recent['MODEL_SCORE'] <=641.705575,
recent['MODEL_SCORE'] <=650.357840,
recent['MODEL_SCORE'] <=660.233233,
recent['MODEL_SCORE'] <=670.720739,
recent['MODEL_SCORE'] <=685.394599,
recent['MODEL_SCORE'] <=705.084107,
recent['MODEL_SCORE'] <=788.155078]

choices = list(range(0,20))
recent['ranks'] = np.select(conditions,choices,default=99)

Expected output

ColA   MODEL_SCORE  ranks
B      300           0
A      400           0
L      500           2
K      600           9
C      400           0
...

CodePudding user response:

Use cut with labels=False, replace missing values to 99:

#add first group starting by 0
l = [0]   l
df['ranks'] = (pd.cut(df['MODEL_SCORE'], bins=l, labels=False, right=False)
                 .fillna(99)
                 .astype(int))
print (df)
  ColA  MODEL_SCORE  ranks
0    B          300      0
1    A          400      0
2    L          500      2
3    K          600      9
4    C        40000     99
  • Related