My datas look like this:
datetime kwh temperature
2021-03-01 23:00:00 01:00 1.276584 8.000000
2021-03-01 23:30:00 01:00 0.833173 8.000000
2021-03-02 00:00:00 01:00 1.480995 7.000000
2021-03-02 00:30:00 01:00 1.094157 8.000000
2021-03-02 01:00:00 01:00 1.221725 7.000000
2021-03-02 01:30:00 01:00 0.893175 6.000000
2021-03-02 02:00:00 01:00 0.663352 7.000000
2021-03-02 02:30:00 01:00 0.984540 6.500000
2021-03-02 03:00:00 01:00 1.332855 6.000000
2021-03-02 03:30:00 01:00 1.108477 6.000000
2021-03-02 04:00:00 01:00 1.219103 6.000000
2021-03-02 04:30:00 01:00 1.455885 5.000000
2021-03-02 05:00:00 01:00 1.758719 4.000000
2021-03-02 05:30:00 01:00 1.679657 4.000000
2021-03-02 23:00:00 01:00 2.482981 9.000000
2021-03-02 23:30:00 01:00 0.830652 8.000000
2021-03-03 00:00:00 01:00 0.673739 10.000000
2021-03-03 00:30:00 01:00 0.452588 10.000000
2021-03-03 01:00:00 01:00 0.790415 9.000000
...
I would like to aggregate data by night and compute the sum() and mean() of kwh and temperatures.
The night begins at 23:00 and ends at 6:00 (excluded).
The output must be a new dataframe where each row represents a single night :
night kwh_sum kwh_mean temperature_mean
2022-02-28 7.8 1.2 5.3
2022-03-01 10.5 2.3 6.7
2022-03-02 3.2 0.2 7.0
...
CodePudding user response:
Here is a way where both the last hour of a day D
and the next 6 hours of the next day (D 1
) are counted together in night = D
:
out = (
df
.assign(night=(df['datetime'] - pd.Timedelta('23h')).dt.tz_convert(None))
.query('night.dt.hour < 7')
.groupby(pd.Grouper(freq='D', key='night'))
.agg({'kwh':[sum, 'mean'], 'temperature': ['mean']})
)
out.columns = ['_'.join(levels) for levels in out.columns]
On your data sample:
kwh_sum kwh_mean temperature_mean
night
2021-03-01 14.892640 1.241053 6.041667
2021-03-02 1.916742 0.638914 9.666667