I want to label entities by its value. Here is what I have done:
data=pd.DataFrame({'Group':['group1','group1','group1','group2','group2','group2'],
'Value':[30,40,10,40,60,70]})
conditions=[(data['Value']<50) & (data['Value']>=40),
(data['Value']<40) & (data['Value']>=30)]
results=['Large','Small']
data['Label']=np.select(conditions,results,default='Other')
It works fine. However, my goal is to do this by its group, for different groups, I use different thresholds. I can of course do the following:
data=pd.DataFrame({'Group':['group1','group1','group1','group2','group2','group2'],
'Value':[30,40,10,40,60,70]})
conditions=[(data.loc[data['Group']=='group1','Value']<50) & (data.loc[data['Group']=='group1','Value']>=40),
(data.loc[data['Group']=='group1','Value']<40) & (data.loc[data['Group']=='group1','Value']>=30)]
results=['Large','Small']
data.loc[data['Group']=='group1','Label']=np.select(conditions,results,default='Other')
data=pd.DataFrame({'Group':['group1','group1','group1','group2','group2','group2'],
'Value':[30,40,10,40,60,70]})
conditions=[(data.loc[data['Group']=='group2','Value']<60) & (data.loc[data['Group']=='group2','Value']>=50),
(data.loc[data['Group']=='group2','Value']<50) & (data.loc[data['Group']=='group2','Value']>=40)]
results=['Large','Small']
data.loc[data['Group']=='group2','Label']=np.select(conditions,results,default='Other')
But I am looking for a more elegant solution, especially for my real dataset, I have more groups and more conditions.
CodePudding user response:
You can generalize with a function:
def conditions(x,y,z,group):
return [(data.loc[data['Group']==group,'Value']<x) & (data.loc[data['Group']==group,'Value']>=y),
(data.loc[data['Group']==group,'Value']<y) & (data.loc[data['Group']==group,'Value']>=z)]
results=['Large','Small']
data.loc[data['Group']=='group1','Label'] = np.select(conditions(50,40,30,'group1'),results,default='Other')
data.loc[data['Group']=='group2','Label'] = np.select(conditions(60,50,40,'group2'),results,default='Other')