Home > database >  pandas column set value using multiple columns and multiple conditions
pandas column set value using multiple columns and multiple conditions

Time:05-25

Considering this sample data:

data = {'January': [1,2,2,1,1], 'February': [1,1,2,0,0], 'March': [1,1,2,1,1], 'April': [1,1,2,0,0], 'May': [1,1,2,1,2], 'Frailty':['Moderate', 'Severe', 'Severe', 'Mild', 'Severe']}
sam_data = pd.DataFrame(data)

January February    March   April   May Frailty
1         1         1         1      1  Moderate
2         1         1         1      1  Severe
2         2         2         2      2  Severe
1         0         1         0      1  Mild
1         0         1         0      2  Severe

I am attempting to create a last column, 'Met_Gap' which uses the following logic:

If Frailty = 'Severe' and each column (January-May) has at least a value of 2, then 'Met', otherwise 'Gap' AND If Frailty 1= 'Severe' and each column (January-May) has at least a value of 1, then 'Met' otherwise 'Gap'

I cannot just add those columns and get a total because to achieve 'Met' there has to be a value of some kind other than 0 in each monthly column.

Tried this below, no luck.

df['Met_Gap'] = np.where((df[df.columns[:5]] >= 10) & (df['Frailty'] == 'Severe'), 'Met', 'Gap')

Sort of stuck given that I have evaluate each column whether it is not 0 and if one column is a 0 it is automatically a 'Gap.' However, if all columns are non-zero, then for 'Severe' they have to be at least a 2 and for other categories, at least a 1.

Expected output:

 January  February  March   April   May Frailty   Met_Gap
     1       1        1      1       1  Moderate    Met
     2       1        1      1       1  Severe      Gap
     2       2        2      2       2  Severe      Met
     1       0        1      0       1  Mild        Gap
     1       0        1      0       2  Severe      Gap

I am sure there is someone creative out there who has come across this issue before. Thanks for taking a look.

CodePudding user response:

IIUC, you can use boolean masks and numpy.where/numpy.select:

cols = list(sam_data.columns[:5])
# ['January', 'February', 'March', 'April', 'May']

s = sam_data[cols].min(axis=1) # min value per row
m = sam_data['Frailty'].eq('Severe')

sam_data['result'] = np.where((m&s.ge(2))|((~m)&s.ge(1)), 'Met', 'Gap')

# alternative to set different values Met1/Met2 for example
#sam_data['result'] = np.select([m&s.ge(2), (~m)&s.ge(1)], ['Met', 'Met'], 'Gap')

output:

   January  February  March  April  May   Frailty result
0        1         1      1      1    1  Moderate    Met
1        2         1      1      1    1    Severe    Gap
2        2         2      2      2    2    Severe    Met
3        1         0      1      0    1      Mild    Gap
4        1         0      1      0    2    Severe    Gap
  • Related