Home > Blockchain >  Create variables based on conditions for different groups
Create variables based on conditions for different groups

Time:10-25

I want to label entities by its value. Here is what I have done:

data=pd.DataFrame({'Group':['group1','group1','group1','group2','group2','group2'],
                 'Value':[30,40,10,40,60,70]})

conditions=[(data['Value']<50) & (data['Value']>=40),
            (data['Value']<40) & (data['Value']>=30)]

results=['Large','Small']

data['Label']=np.select(conditions,results,default='Other')

It works fine. However, my goal is to do this by its group, for different groups, I use different thresholds. I can of course do the following:

data=pd.DataFrame({'Group':['group1','group1','group1','group2','group2','group2'],
                 'Value':[30,40,10,40,60,70]})

conditions=[(data.loc[data['Group']=='group1','Value']<50) & (data.loc[data['Group']=='group1','Value']>=40),
            (data.loc[data['Group']=='group1','Value']<40) & (data.loc[data['Group']=='group1','Value']>=30)]

results=['Large','Small']

data.loc[data['Group']=='group1','Label']=np.select(conditions,results,default='Other')

data=pd.DataFrame({'Group':['group1','group1','group1','group2','group2','group2'],
                 'Value':[30,40,10,40,60,70]})

conditions=[(data.loc[data['Group']=='group2','Value']<60) & (data.loc[data['Group']=='group2','Value']>=50),
            (data.loc[data['Group']=='group2','Value']<50) & (data.loc[data['Group']=='group2','Value']>=40)]

results=['Large','Small']

data.loc[data['Group']=='group2','Label']=np.select(conditions,results,default='Other')

But I am looking for a more elegant solution, especially for my real dataset, I have more groups and more conditions.

CodePudding user response:

You can generalize with a function:

def conditions(x,y,z,group):
    return [(data.loc[data['Group']==group,'Value']<x) & (data.loc[data['Group']==group,'Value']>=y),
        (data.loc[data['Group']==group,'Value']<y) & (data.loc[data['Group']==group,'Value']>=z)]

results=['Large','Small']

data.loc[data['Group']=='group1','Label'] = np.select(conditions(50,40,30,'group1'),results,default='Other')
data.loc[data['Group']=='group2','Label'] = np.select(conditions(60,50,40,'group2'),results,default='Other')
  • Related