tavg tmin tmax prcp
year
1990-01-01 9.4 6.0 15.1 0.0
1990-01-02 17.9 7.3 24.3 0.0
1990-01-03 14.9 8.3 22.1 0.0
1990-01-04 20.9 14.6 28.2 0.0
1990-01-05 29.8 25.0 36.0 0.3
Expected output :
tavg tmin tmax prcp
year
1990 9.2 5.0 15.1 0.0
1991 16.4 6.3 24.3 0.0
1992 12.3 8.3 22.1 0.0
1993 21.7 14.6 28.2 0.0
1994 28.6 25.0 36.0 0.3
CodePudding user response:
Firstly, you need to convert years column into datetime64[ns] type using the line, if df is your dataframe:
df["datetime"] = pd.to_datetime(df["year"], format='%Y-%m-%d')
and then, group by year as follow:
df = df.groupby(df.index.year).mean()
Actually, you can't do operations on dates without converting it into datetime64[ns] type. Good luck!
CodePudding user response:
Extract year from year
column and aggregate mean
:
print (df)
year tavg tmin tmax prcp
0 1990-01-01 9.4 6.0 15.1 0.0
1 1990-01-02 17.9 7.3 24.3 0.0
2 1991-01-03 14.9 8.3 22.1 0.0
3 1991-01-04 20.9 14.6 28.2 0.0
4 1992-01-05 29.8 25.0 36.0 0.3
df = df.groupby(df['year'].dt.year).mean()
print (df)
tavg tmin tmax prcp
year
1990 13.65 6.65 19.70 0.0
1991 17.90 11.45 25.15 0.0
1992 29.80 25.00 36.00 0.3