Keep one column constant but create case where all other columns a value (using OR) pandas-CodePudding

I want to create a column where several columns can be greater than one but one column has to be 0 at all times e.g. :

df['indicator'] = np.where(( (df['01'] > 0) | (df['02']> 0)  | (df['03']> 0) | (df['04']> 0) 
               & (df['spend'] == 0 )), 1, 0)

I want to create this flag based on whether if either of columns 01 to 04 are greater than 0 then 1 else 0. But whilst each of these are > 0 the spend column must be kept at 0 in all cases. This means if 01 and 02 are > 0 then spend must be 0, etc.

However using the above logic i end up with cases where spend is > 0 - what am i missing ?

CodePudding user response：

personally, when working with multiple conditions in a data frame, I use masks: stackoverflow post about masks

col_1_idx = df['01'] > 0
col_2_idx = df['02'] > 0
col_3_idx = df['03'] > 0
col_4_idx = df['04'] > 0
or_col_idx = col_1_idx | col_2_idx | col_3_idx | col_4_idx

spend_idx = df['spend'] == 0

df['indicator'] = np.where(df[or_col_idx & spend_idx]), 1, 0)

CodePudding user response：

IIUC, this can be simplified to:

df['indicator'] = (df[['01','02','03','04']].gt(0).any(axis=1) & df['spend'].eq(0)).astype(int)

I use the .gt(), .lt(), .eq(), .le() etc. a lot to simplify these () we run into.

You really don't need np.where when your desired output is essentially a numeric Boolean.