I have data that looks like this:
Thing quarter num_col1 num_col2
aaa 2010Q1 1.3 99.76
bbb 2010Q1 11.3 109.76
ccc 2010Q1 91.3 119.76
.....
.....
aaa 2019Q4 21.3 119.76
bbb 2019Q4 41.3 299.76
ccc 2019Q4 201.3 199.76
I need to group by Thing
column, and calculate moving-average for columns num_col1
and num_col2
for all the quarters.
Here's what I have tried till now:
## define moving-average function
N = 2
def pandas_rolling(x):
return pd.Series.rolling(x, window=N).mean()
## now group-by and calculate moving averages
things_groupby = df.groupby(by=['Thing'])
## below lines are giving incorrect values
df.loc[:,'num_col1_SMA'] = (things_groupby['num_col1'].apply(pandas_rolling)).values
df.loc[:,'num_col2_SMA'] = (things_groupby['num_col2'].apply(pandas_rolling)).values
But, when I do for one of the unique thing from Thing
column manually like shown below, it gives expected results.
pandas_rolling(df.loc[df.loc[:,'Topic']=='aaa'].loc[:,'num_col1']).values
what am I doing wrong with calculating the moving-average for individual group and then populating them in the dataframe? How do I do this properly?
CodePudding user response:
You can remove values
:
df['num_col1_SMA'] = things_groupby['num_col1'].apply(pandas_rolling)
df['num_col2_SMA'] = things_groupby['num_col2'].apply(pandas_rolling)
Or:
df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
.apply(pandas_rolling))
If possible without groupby.apply
is necessary remove first level of MultiIndex
:
df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
.rolling(window=N)
.mean()
.droplevel(0))