Home > Software engineering >  Pandas - groupby column then specify condition
Pandas - groupby column then specify condition

Time:05-03

I have a dataframe df that looks like this:

   Batch   Fruit  Property1  Property2  Property3
0      1   Apple         38         55         52
1      1  Banana         59         37         47
2      1   Pear          62         34         25
3      2   Apple         95         64         48
4      2  Banana         10         84         39
5      2   Pear          16         87         38
6      3   Apple         29         34         49
7      3  Banana         27         41         51
8      3   Pear          35         33         17

For the dataframe, I want to add a column 'Status', which can have the value 'keep' or 'remove'. The condition is that all Fruits within Batch have column 'Status' == keep when:

  1. Apple has all Property1 < 30, Property2 < 40, Property3 < 50
  2. Banana has all Property1 < 35, Property2 < 45, Property3 < 55
  3. Pear has all Property1 < 37, Property2 < 46, Property3 < 53

Results should look like:

   Batch   Fruit  Property1  Property2  Property3 Status 
0      1   Apple         38         55         52  remove
1      1  Banana         59         37         47  remove
2      1   Pear          62         34         25  remove
3      2   Apple         95         64         48  remove
4      2  Banana         10         84         39  remove
5      2   Pear          16         87         38  remove
6      3   Apple         29         34         49    keep
7      3  Banana         27         41         51    keep
8      3   Pear          35         33         17    keep

CodePudding user response:

Try this :

    df['Status']='remove'
    df['Status']=np.where((df['Property1']<30)&(df['Property2']<40)&(df['Property3']<50)&(df['Fruit']=='Apple'),'keep',df['Status'])
    df['Status']=np.where((df['Property1']<35)&(df['Property2']<45)&(df['Property3']<55)&(df['Fruit']=='Banana'),'keep',df['Status'])
    df['Status']=np.where((df['Property1']<37)&(df['Property2']<46)&(df['Property3']<53)&(df['Fruit']=='Pear'),'keep',df['Status'])

CodePudding user response:

def condition(x):
    if (x['Property1']<30)&(x['Property2']<40)&(x['Property3']<50)&(x['Fruit']=='Apple'):
        return "Keep"
    elif (x['Property1']<35)&(x['Property2']<45)&(x['Property3']<55)&(x['Fruit']=='Banana'):
        return "Keep"
    elif (x['Property1']<37)&(x['Property2']<46)&(x['Property3']<53)&(x['Fruit']=='Pear'):
        return "Keep"
    
    else:
        return "Remove"
 
df['test'] = df.apply(condition, axis=1)
  • Related