Home > Back-end >  Pandas apply with a function as a keyword
Pandas apply with a function as a keyword

Time:05-11

How can I pass an optional function as a keyword into another function called with pd.apply?

For example, sometimes the dataframe I am working with has z in it:

df = pd.DataFrame(
    {
        "x": [1, 2, 3, 4],
        "y": [100, 200, 300, 400],
        "z": [0.1, 0.2, 0.3, 0.4],
    }
)
df

enter image description here

This is easy because I can just use my function (much simplified herein of course):

def myfunc(dataframe):
    a = dataframe["z"] * 2
    
    return pd.Series(
        {
            "a": a,
        }
    )

To get my final dataframe with a:

output = df.apply(
    myfunc,
    axis=1,
)

df = df.join(output)
df

enter image description here

In the case where z is not in my starting dataframe, I need to call a function to calculate it. I can call several different functions to calculate it, and I want to be able to name a specific function. Here, I want to use the function computeZ.

def computeZ(dataframe):
    results = dataframe["x"] / 20
    return results

df = pd.DataFrame(
    {
        "x": [1, 2, 3, 4],
        "y": [100, 200, 300, 400],
    }
)
df

enter image description here

Here is the problem: I am trying this below, but I get TypeError: myfunc2() takes 1 positional argument but 2 were given. What is the correct way to do this? For various reasons, I need to use pd.apply rather than working in numpy arrays or using anonymous functions.

def myfunc2(dataframe, **kwds):
    
    if analysis:
        res = globals()[analysis](dataframe["x"])
        a = res * 2
    else:
        a = dataframe["z"] * 2
    
    return pd.Series(
        {
            "a": a,
        }
    )

output = df.apply(
    myfunc2,
    axis=1,
    args=("computeZ",),
)

CodePudding user response:

It looks like you need at second positional parameter in the myfunc2 definition. The first argument passed will be the row itself for axis=1 - then your positional parameter:

def myfunc2(row, p2):      
    return p2(row)

and then the passed positional computeZ should be the actual variable and not a string representation (not "computeZ" but computeZ instead):

output = df.apply(
    myfunc2,
    axis=1,
    args=(computeZ,)
)

output

0    0.05
1    0.10
2    0.15
3    0.20

Or you could do it using kwargs:

def myfunc2(row, **kwargs):
    return kwargs['your_func'](row)

output = df.apply(
    myfunc2,
    axis=1,
    your_func=computeZ
)

output

0    0.05
1    0.10
2    0.15
3    0.20
  • Related