Passing a Panda's rolling aggregation method as a function argumnet-CodePudding

I'd like to wrap a group-by-rolling-aggregation sequence within a function, in a manner that would pass the aggregation method itself, like mean or std, as a function argument, like in the code below:

df = pd.DataFrame({'date': ['2020-01-13', '2020-09-19', '2021-05-10', '2022-02-01'],

                   'provider': ['A', 'B', 'A', 'B'],

                   'points': [10, 2, 1, 8]})

def provider_rolling_window(df,ind,window_size,agg_method):
    s = df.sort_values(by=['provider','date'], ascending=True)\
                .groupby(['provider'])[ind]\
                .rolling(window_size, min_periods = 1)\
                .agg_method\
                .reset_index(drop=True,level=0)
    return(s)
    
df['moving_avg_3'] =  provider_rolling_window(df,'points',3,mean)

However the interpreter doesn't really like this and complains :

---> 14 df['moving_avg_3'] =  provider_rolling_window(df,'points',3,mean)

NameError: name 'mean' is not defined

Even if I try:

f = pd.groupby.rolling.mean
df['moving_avg_3'] =  provider_rolling_window(df,'points',3,f)

It still complains:

AttributeError: module 'pandas' has no attribute 'groupby'

Is there a proper way to go about this?

CodePudding user response：

Instead, pass your agg_method as a string and call agg with it::

def provider_rolling_window(df,ind,window_size,agg_method):
    s = df.sort_values(by=['provider','date'], ascending=True)\
                .groupby(['provider'])[ind]\
                .rolling(window_size, min_periods = 1)\
                .agg(agg_method)\
                .reset_index(drop=True,level=0)
    return(s)
    
df['moving_avg_3'] =  provider_rolling_window(df,'points',3,'mean')

Output:

>>> df
         date provider  points  moving_avg_3
0  2020-01-13        A      10          10.0
1  2020-09-19        B       2           2.0
2  2021-05-10        A       1           5.5
3  2022-02-01        B       8           5.0

CodePudding user response：

Firs of all you need to pass the existing function, like np.mean. Function mean is not defined in the python itself.

The way to do this is to use the function apply. So your function would look like this:

def provider_rolling_window(df, ind, window_size, agg_method):
    s = df.sort_values(by=['provider','date'], ascending=True)\
                .groupby(['provider'])[ind]\
                .rolling(window_size, min_periods = 1)\
                .apply(agg_method)\
                .reset_index(drop=True,level=0)
    return(s)

df['moving_avg_3'] =  provider_rolling_window(df, 'points', 3, np.mean)
print(df)