I am trying to elaborate a df based on some of its columns (col1-4 herein), and to add a new column based on certain conditions defined in a function. The code might be clearer than an explanation in plain english:
def get_new_col(df):
if (df['col1'] == 0) | (df['col2'] == 0):
first_half = 0
else:
first_half = min(df['col1'], df['col2'])
if (df['col3'] == 0) | (df['col4'] == 0):
second_half = 0
else:
second_half = min(df['col3'], df['col4'])
return first_half second_half
df['new_col'] = get_new_col(df)
My problem is that I am getting a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
, even though I am properly (I think?) bracketing the conditions of the if statements and using the bitwise operator | instead of or
, as suggested in this other thread.
Any idea on how to solve this?
CodePudding user response:
IIUC, you need DataFrame.apply
on rows
def get_new_col(row):
if (row['col1'] == 0) | (row['col2'] == 0):
first_half = 0
else:
first_half = min(row['col1'], row['col2'])
if (row['col3'] == 0) | (row['col4'] == 0):
second_half = 0
else:
second_half = min(row['col3'], row['col4'])
return first_half second_half
df['new_col'] = df.apply(get_new_col, axis=1)
Problem you get is that df['col1'] == 0
returns a boolean Series which couldn't be accepted by if
statement.
With df.apply(get_new_col, axis=1)
, we are only passing one row of dataframe to get_new_col
function.
To do the same without apply
, you can try
def get_new_col(df):
first_half = df[['col1', 'col2']].min(axis=1)
first_half = first_half.mask(df[['col1', 'col2']].eq(0).any(axis=1), 0)
df['new_col'] = get_new_col(df)