I'd like to wrap a group-by-rolling-aggregation sequence within a function, in a manner that would pass the aggregation method itself, like mean
or std
, as a function argument, like in the code below:
df = pd.DataFrame({'date': ['2020-01-13', '2020-09-19', '2021-05-10', '2022-02-01'],
'provider': ['A', 'B', 'A', 'B'],
'points': [10, 2, 1, 8]})
def provider_rolling_window(df,ind,window_size,agg_method):
s = df.sort_values(by=['provider','date'], ascending=True)\
.groupby(['provider'])[ind]\
.rolling(window_size, min_periods = 1)\
.agg_method\
.reset_index(drop=True,level=0)
return(s)
df['moving_avg_3'] = provider_rolling_window(df,'points',3,mean)
However the interpreter doesn't really like this and complains :
---> 14 df['moving_avg_3'] = provider_rolling_window(df,'points',3,mean)
NameError: name 'mean' is not defined
Even if I try:
f = pd.groupby.rolling.mean
df['moving_avg_3'] = provider_rolling_window(df,'points',3,f)
It still complains:
AttributeError: module 'pandas' has no attribute 'groupby'
Is there a proper way to go about this?
CodePudding user response:
Instead, pass your agg_method
as a string and call agg
with it::
def provider_rolling_window(df,ind,window_size,agg_method):
s = df.sort_values(by=['provider','date'], ascending=True)\
.groupby(['provider'])[ind]\
.rolling(window_size, min_periods = 1)\
.agg(agg_method)\
.reset_index(drop=True,level=0)
return(s)
df['moving_avg_3'] = provider_rolling_window(df,'points',3,'mean')
Output:
>>> df
date provider points moving_avg_3
0 2020-01-13 A 10 10.0
1 2020-09-19 B 2 2.0
2 2021-05-10 A 1 5.5
3 2022-02-01 B 8 5.0
CodePudding user response:
Firs of all you need to pass the existing function, like np.mean
. Function mean is not defined in the python itself.
The way to do this is to use the function apply
. So your function would look like this:
def provider_rolling_window(df, ind, window_size, agg_method):
s = df.sort_values(by=['provider','date'], ascending=True)\
.groupby(['provider'])[ind]\
.rolling(window_size, min_periods = 1)\
.apply(agg_method)\
.reset_index(drop=True,level=0)
return(s)
df['moving_avg_3'] = provider_rolling_window(df, 'points', 3, np.mean)
print(df)