Home > Software design >  Pandas timeseries : Compute the average each night (between 11 pm and 6 am)
Pandas timeseries : Compute the average each night (between 11 pm and 6 am)

Time:04-07

My datas look like this:

datetime                  kwh             temperature
2021-03-01 23:00:00 01:00 1.276584        8.000000
2021-03-01 23:30:00 01:00 0.833173        8.000000
2021-03-02 00:00:00 01:00 1.480995        7.000000
2021-03-02 00:30:00 01:00 1.094157        8.000000
2021-03-02 01:00:00 01:00 1.221725        7.000000
2021-03-02 01:30:00 01:00 0.893175        6.000000
2021-03-02 02:00:00 01:00 0.663352        7.000000
2021-03-02 02:30:00 01:00 0.984540        6.500000
2021-03-02 03:00:00 01:00 1.332855        6.000000
2021-03-02 03:30:00 01:00 1.108477        6.000000
2021-03-02 04:00:00 01:00 1.219103        6.000000
2021-03-02 04:30:00 01:00 1.455885        5.000000
2021-03-02 05:00:00 01:00 1.758719        4.000000
2021-03-02 05:30:00 01:00 1.679657        4.000000
2021-03-02 23:00:00 01:00 2.482981        9.000000
2021-03-02 23:30:00 01:00 0.830652        8.000000
2021-03-03 00:00:00 01:00 0.673739       10.000000
2021-03-03 00:30:00 01:00 0.452588       10.000000
2021-03-03 01:00:00 01:00 0.790415        9.000000
...

I would like to aggregate data by night and compute the sum() and mean() of kwh and temperatures.

The night begins at 23:00 and ends at 6:00 (excluded).

The output must be a new dataframe where each row represents a single night :

night            kwh_sum  kwh_mean  temperature_mean
2022-02-28       7.8      1.2       5.3
2022-03-01       10.5     2.3       6.7
2022-03-02       3.2      0.2       7.0
...

CodePudding user response:

Here is a way where both the last hour of a day D and the next 6 hours of the next day (D 1) are counted together in night = D:

out = (
    df
    .assign(night=(df['datetime'] - pd.Timedelta('23h')).dt.tz_convert(None))
    .query('night.dt.hour < 7')
    .groupby(pd.Grouper(freq='D', key='night'))
    .agg({'kwh':[sum, 'mean'], 'temperature': ['mean']})
)
out.columns = ['_'.join(levels) for levels in out.columns]

On your data sample:

              kwh_sum  kwh_mean  temperature_mean
night                                            
2021-03-01  14.892640  1.241053          6.041667
2021-03-02   1.916742  0.638914          9.666667
  • Related