I'm new to python and I want to calculate the sum of the daily average temperatures on many hours and then append it to the dataframe e.g :
date | Temperature |
---|---|
2015-04-01 00:00:00 | 3.9 |
2015-04-01 01:00:00 | 2.10 |
2015-04-01 02:00:00 | 4.8 |
⋮ | ⋮ |
2015-10-31 23:00:00 | 2.16 |
I'm trying to get this output:
date | Temperature | averageT |
---|---|---|
2015-04-01 00:00:00 | 3.9 | 5 |
2015-04-01 01:00:00 | 2.10 | 5 |
2015-04-01 02:00:00 | 4.8 | 5 |
⋮ | ⋮ | |
2015-10-31 23:00:00 | 2.16 | 7 |
CodePudding user response:
df['averageT'] = df.resample(rule='D', on='date').transform('mean')
CodePudding user response:
I used this as input data:
df = pd.DataFrame({
'date': [pd.Timestamp('2015-04-01 00:00:00'),pd.Timestamp('2015-04-01 01:00:00'),pd.Timestamp('2015-04-01 02:00:00'),pd.Timestamp('2015-10-31 23:00:00')],
'Temperature': [3.9, 2.1, 4.8, 2.16]
})
date Temperature
0 2015-04-01 00:00:00 3.90
1 2015-04-01 01:00:00 2.10
2 2015-04-01 02:00:00 4.80
3 2015-10-31 23:00:00 2.16
Then use pd.Grouper
to group the data day by day (if there is no data for a day, the mean will be NaN
.
df['averageT'] = df.groupby(pd.Grouper(key='date', freq='1D'))['Temperature'].transform('mean')
print(df)
date Temperature averageT
0 2015-04-01 00:00:00 3.90 3.60
1 2015-04-01 01:00:00 2.10 3.60
2 2015-04-01 02:00:00 4.80 3.60
3 2015-10-31 23:00:00 2.16 2.16