I need to write a complicated function that will evaluate a new column for a DataFrame
in pandas.
This function will have to use data from multiple (more than 10) columns of this DataFrame.
It won't fit into a lambda
, to plug it in easily to the apply()
function.
I don't want to write a function that takes more than 10 arguments and plug it into apply()
, because it would hurt readability of my code.
I would rather not use for
loop to iterate over rows, as it has poor performance.
Is there a clever solution to this problem?
CodePudding user response:
Simply, make a function that takes the row
as input and pass it to apply() with the axis=1
argument.
For example:
df = pd.DataFrame([[4, 9], ["x", "y"], [True, False]], columns=["A", "B"])
print(df)
# A B
#0 4 9
#1 x y
#2 True False
def f(row):
if type(row.A) is bool:
return "X"
else:
return row.A row.B
df["C"] = df.apply(f, axis=1)
print(df)
# A B C
#0 4 9 13
#1 x y xy
#2 True False X
CodePudding user response:
If all column values are on the same row you can use apply(func, axis=1)
to pass a row from your df as argument to function func
. Then in func
, you can extract all values from your row.