I have the following dataframe:
d = {'quarter': ['1996q1', '1996q2', '1996q3', '1996q4', '1997q1', '1997q2', '1997q3', '1997q4', '1996q1', '1996q2', '1996q3', '1996q4', '1997q1', '1997q2', '1997q3', '1997q4'], 'country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'ITA', 'ITA', 'ITA', 'ITA', 'ITA', 'ITA', 'ITA', 'ITA'], 'value': [0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 5, 0, 0, 0, 2, 2]}
df = pd.DataFrame(data=d)
df
quarter country value
0 1996q1 USA 0
1 1996q2 USA 0
2 1996q3 USA 0
3 1996q4 USA 0
4 1997q1 USA 1
5 1997q2 USA 1
6 1997q3 USA 1
7 1997q4 USA 1
8 1996q1 ITA 3
9 1996q2 ITA 3
10 1996q3 ITA 5
11 1996q4 ITA 0
12 1997q1 ITA 0
13 1997q2 ITA 0
14 1997q3 ITA 2
15 1997q4 ITA 2
I wish to compute the annual mean for every 4 quarters' values, while altering the country and quarter columns accordingly, to give:
d2 = {'year': ['1996', '1997', '1996', '1997',], 'country': ['USA', 'USA', 'ITA', ' ITA '], 'value': [0, 1, 2.75, 1]}
df2 = pd.DataFrame(data=d2)
df2
year country value
0 1996 USA 0.00
1 1997 USA 1.00
2 1996 ITA 2.75
3 1997 ITA 1.00
CodePudding user response:
First, convert your quarter
column to pd.PeriodIndex
. To get year
from Period
use the acessor Period.year
. Then groupby year
country
.
df.assign(year=pd.PeriodIndex(df.quarter, freq="Q").year).groupby(
["year", "country"], as_index=False, sort=False
)["value"].mean()
year country value
0 1996 USA 0.00
1 1997 USA 1.00
2 1996 ITA 2.75
3 1997 ITA 1.00
CodePudding user response:
You can groupby
multiple series at once. In your case, you need two series - one for year which can be obtained from quarter
and another for country
.
grp = [pd.to_datetime(df.quarter.rename("year")).dt.strftime("%Y"), df.country.str.strip()]
print(df.groupby(grp).value.mean().reset_index())
# year country value
# 0 1996 ITA 2.75
# 1 1996 USA 0.00
# 2 1997 ITA 1.00
# 3 1997 USA 1.00