import pandas as pd
import numpy as np
np.random.seed(99)
rows = 10
df = pd.DataFrame ({'A' : np.random.choice(range(0, 2), rows, replace = True),
'B' : np.random.choice(range(0, 2), rows, replace = True)})
def get_C1(row):
return row.A row.B
def get_C2(row):
return 'X' if row.A row.B == 0 else 'Y'
def get_C3(row):
is_zero = row.A row.B
return "X" if is_zero else "Y"
df = df.assign(C = lambda row: get_C3(row))
Why the get_C2 and get_C3 functions return an error?
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
CodePudding user response:
You're thinking that df.assign
, when passed a function, behaves like df.apply
with axis=1
, which calls the function for each row.
That's incorrect.
Where the value is a callable, evaluated on df
That means that the function you pass to assign
is called on the whole dataframe instead of each individual row.
So, in your function get_C3
, the row
parameter is not a row at all. It's a whole dataframe (and should be renamed to df
or something else) and so row.A
and row.B
are two whole columns, rather than single cell values.
Thus, is_zero
is a whole column as well, and ... if is_zero ...
will not work.