Home > Back-end >  Classify the rows in dataframe based on their maximum
Classify the rows in dataframe based on their maximum

Time:02-17

I have a dataframe which is very big. I don't want to iterate over each row. I want to classify them based on the maximum of each row. If the values is greater than 0.45 is in class 3, if<0.2 in class 1, if >0.45 in class 3. Here is a sample of the dataframe:

import pandas as pd
import numpy as np
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar
df = pd.DataFrame()
df['c0'] = [ 0.4656,  0.1530,0.1854 ]
df['c1'] = [ 0.4452, 0.2064, 0.1416]
df['c2'] = [0.4224 ,  0.4224, 0.1800]
df['max'] = df.max(axis=1)
df

And the dataframe which I want is:

enter image description here

Could you please help me with that?

CodePudding user response:

IIUC, you could use np.select to assign class values to rows:

df['class'] = np.select([df['max']>0.45, df['max']<0.2], [3, 1], 2)

Output:

       c0      c1      c2     max  class
0  0.4656  0.4452  0.4224  0.4656      3
1  0.1530  0.2064  0.4224  0.4224      2
2  0.1854  0.1416  0.1800  0.1854      1
  • Related