Home > Enterprise >  Classify a value under certain conditions in pandas dataframe
Classify a value under certain conditions in pandas dataframe

Time:09-30

I have this dataframe:

value  limit_1  limit_2   limit_3  limit_4    
 10      2        3         7        10        
 11      5        6         11       13        
 2      0.3      0.9        2.01     2.99   

I want to add another column called class that classifies the value column this way:

if value <= limit1.value then 1
if value > limit1.value and <= limit2.value then 2
if value > limit2.value and <= limit3.value then 3
if value > limit3.value then 4

to get this result:

value  limit_1  limit_2   limit_3  limit_4    CLASS
 10      2        3         7        10        4
 11      5        6         11       13        3
 2      0.3      0.9        2.01     2.99      3

I know I could work to get these 'if's to work but my dataframe has 2kk rows and I need the fasted way to perform such classification.

I tried to use .cut function but the result was not what I expected/wanted

Thanks

CodePudding user response:

We can use the rank method over the column axis (axis=1):

df["CLASS"] = df.rank(axis=1, method="first").iloc[:, 0].astype(int)
   value  limit_1  limit_2  limit_3  limi_4  CLASS
0     10      2.0      3.0     7.00   10.00      4
1     11      5.0      6.0    11.00   13.00      3
2      2      0.3      0.9     2.01    2.99      3

CodePudding user response:

We can use np.select:

import numpy as np
conditions = [df["value"]<df["limit_1"], 
              df["value"].between(df["limit_1"], df["limit_2"]), 
              df["value"].between(df["limit_2"], df["limit_3"]),
              df["value"]>df["limit_3"]]

df["CLASS"] = np.select(conditions, [1,2,3,4])

>>> df
   value  limit_1  limit_2  limit_3  limit_4  CLASS
0     10      2.0      3.0     7.00    10.00      4
1     11      5.0      6.0    11.00    13.00      3
2      2      0.3      0.9     2.01     2.99      3
  • Related