Writing a complicated function that will by applied to a DataFrame-CodePudding

I need to write a complicated function that will evaluate a new column for a DataFrame in pandas. This function will have to use data from multiple (more than 10) columns of this DataFrame.

It won't fit into a lambda, to plug it in easily to the apply() function. I don't want to write a function that takes more than 10 arguments and plug it into apply(), because it would hurt readability of my code.

I would rather not use for loop to iterate over rows, as it has poor performance.

Is there a clever solution to this problem?

CodePudding user response：

Simply, make a function that takes the row as input and pass it to apply() with the axis=1 argument.

For example:

df = pd.DataFrame([[4, 9], ["x", "y"], [True, False]], columns=["A", "B"])
print(df)

#      A      B
#0     4      9
#1     x      y
#2  True  False 


def f(row):
    if type(row.A) is bool:
        return "X"
    else:
        return row.A   row.B


df["C"] = df.apply(f, axis=1)
print(df)


#      A      B   C
#0     4      9  13
#1     x      y  xy
#2  True  False   X

CodePudding user response：

If all column values are on the same row you can use apply(func, axis=1) to pass a row from your df as argument to function func. Then in func, you can extract all values from your row.