I want to create a column where several columns can be greater than one but one column has to be 0 at all times e.g. :
df['indicator'] = np.where(( (df['01'] > 0) | (df['02']> 0) | (df['03']> 0) | (df['04']> 0)
& (df['spend'] == 0 )), 1, 0)
I want to create this flag based on whether if either of columns 01 to 04 are greater than 0 then 1 else 0. But whilst each of these are > 0 the spend column must be kept at 0 in all cases. This means if 01 and 02 are > 0 then spend must be 0, etc.
However using the above logic i end up with cases where spend is > 0 - what am i missing ?
CodePudding user response:
personally, when working with multiple conditions in a data frame, I use masks: stackoverflow post about masks
col_1_idx = df['01'] > 0
col_2_idx = df['02'] > 0
col_3_idx = df['03'] > 0
col_4_idx = df['04'] > 0
or_col_idx = col_1_idx | col_2_idx | col_3_idx | col_4_idx
spend_idx = df['spend'] == 0
df['indicator'] = np.where(df[or_col_idx & spend_idx]), 1, 0)
CodePudding user response:
IIUC, this can be simplified to:
df['indicator'] = (df[['01','02','03','04']].gt(0).any(axis=1) & df['spend'].eq(0)).astype(int)
I use the .gt(), .lt(), .eq(), .le()
etc. a lot to simplify these ()
we run into.
You really don't need np.where
when your desired output is essentially a numeric Boolean.