I have the following Pandas dataframe with GDP% by country/year. I have two countries, US (with 6 rows) and FR (with 4 rows).
I need to get the 3-period moving average of the GDP% by country. If I use the rolling window function, I get the first FR moving average taking the last two moving average from US, and that's wrong. How can I roll the data grouped by country?
df = pd.DataFrame({
'country': ['US','US','US','US', 'US', 'US', 'FR','FR','FR','FR'],
'year': [1990, 1991, 1992, 1993, 1994, 1995, 1990, 1991, 1992, 1993],
'gdp': [1.2, 1.4, 1.7, 2.1, 2.3, 1.9, 4.1, 4.6, 4.3, 4.4]
})
print(df)
country year gdp
0 US 1990 1.2
1 US 1991 1.4
2 US 1992 1.7
3 US 1993 2.1
4 US 1994 2.3
5 US 1995 1.9
6 FR 1990 4.1
7 FR 1991 4.6
8 FR 1992 4.3
9 FR 1993 4.4
df2 = df['gdp'].rolling(3, min_periods=1).mean()
print(df2)
0 1.200000
1 1.300000
2 1.433333
3 1.733333
4 2.033333
5 2.100000
6 2.766667
7 3.533333
8 4.333333
9 4.433333
CodePudding user response:
Try this:
df['rolling 3 mean'] = df.groupby('country', group_keys=False)['gdp']\
.rolling(3, min_periods=1).mean().droplevel
Output:
country year gdp rolling 3 mean
0 US 1990 1.2 1.200000
1 US 1991 1.4 1.300000
2 US 1992 1.7 1.433333
3 US 1993 2.1 1.733333
4 US 1994 2.3 2.033333
5 US 1995 1.9 2.100000
6 FR 1990 4.1 4.100000
7 FR 1991 4.6 4.350000
8 FR 1992 4.3 4.333333
9 FR 1993 4.4 4.433333