We have a time series with a values
column apart from the date
column and for every day we have 24 rows (hours of the day). The goal is to create an additional column which contains the mean of the values for a specific hour of the day on a rolling window. For example for a rolling window of 365*24 the new column would convey the information: "The values
column at this hour of the day for the last year had a mean value of X".
The input and output should look something like this for a rolling window of 24:
index date(as datetime64) values new_column reasoning
0 2021-02-21 00:00:00 00:00 100 -
1 2021-02-21 01:00:00 00:00 200 -
2 2021-02-21 02:00:00 00:00 300 -
3 2021-02-21 03:00:00 00:00 400 -
4 2021-02-21 04:00:00 00:00 500 -
5 2021-02-21 05:00:00 00:00 600 -
6 2021-02-21 06:00:00 00:00 700 -
7 2021-02-21 07:00:00 00:00 800 -
8 2021-02-21 08:00:00 00:00 900 -
9 2021-02-21 09:00:00 00:00 1000 -
10 2021-02-21 10:00:00 00:00 1100 -
11 2021-02-21 11:00:00 00:00 1200 -
12 2021-02-21 12:00:00 00:00 1300 -
13 2021-02-21 13:00:00 00:00 1400 -
14 2021-02-21 14:00:00 00:00 1500 -
15 2021-02-21 15:00:00 00:00 1600 -
16 2021-02-21 16:00:00 00:00 1700 -
17 2021-02-21 17:00:00 00:00 1800 -
18 2021-02-21 18:00:00 00:00 1900 -
19 2021-02-21 19:00:00 00:00 2000 -
20 2021-02-21 20:00:00 00:00 2100 -
21 2021-02-21 21:00:00 00:00 2200 -
22 2021-02-21 22:00:00 00:00 2300 -
23 2021-02-21 23:00:00 00:00 2400 -
24 2021-02-22 00:00:00 00:00 200 150 (i.e the mean of 100 and 200, the values at 00:00 for the 2 days)
25 2021-02-22 01:00:00 00:00 400 300 (i.e. mean of 200 and 400)
26 2021-02-22 02:00:00 00:00 600 450 (i.e. mean of 300 and 600)
27 2021-02-22 03:00:00 00:00 800 etc. etc.
28 2021-02-22 04:00:00 00:00 1000
29 2021-02-22 05:00:00 00:00 1200
30 2021-02-22 06:00:00 00:00 1400
31 2021-02-22 07:00:00 00:00 1600
32 2021-02-22 08:00:00 00:00 1800
33 2021-02-22 09:00:00 00:00 2000
34 2021-02-22 10:00:00 00:00 2200
35 2021-02-22 11:00:00 00:00 2400
36 2021-02-22 12:00:00 00:00 2600
37 2021-02-22 13:00:00 00:00 2800
38 2021-02-22 14:00:00 00:00 3000
39 2021-02-22 15:00:00 00:00 3200
40 2021-02-22 16:00:00 00:00 3400
41 2021-02-22 17:00:00 00:00 3600
42 2021-02-22 18:00:00 00:00 3800
43 2021-02-22 19:00:00 00:00 4000
44 2021-02-22 20:00:00 00:00 4200
45 2021-02-22 21:00:00 00:00 4400
46 2021-02-22 22:00:00 00:00 4600
47 2021-02-22 23:00:00 00:00 4800
Before even going to making a new column, I thought of trying a groupby
:
means = df.groupby(df.date.dt.hour).values.mean()
However, that has the issue of returning the mean-per-hour for the entire dataframe and not a rolling window of it while a groupby(df.date.dt.hour).values.rolling(X).mean()
only considers a window of the final grouped results and not a window of the initial data to be considered before grouping, and I don't know how to assign the values to a new column to the initial dataframe since the groupby() results do not have the same index.
CodePudding user response:
Use droplevel
after groupby
/ rolling
to drop the level created by groupby
:
# Or .droplevel('date')
df['mean'] = df.groupby(df['date'].dt.hour).rolling(2)['values'].mean().droplevel(0)
print(df)
# Output
date values mean
0 2021-02-21 00:00:00 00:00 100 NaN
1 2021-02-21 01:00:00 00:00 200 NaN
2 2021-02-21 02:00:00 00:00 300 NaN
3 2021-02-21 03:00:00 00:00 400 NaN
4 2021-02-21 04:00:00 00:00 500 NaN
5 2021-02-21 05:00:00 00:00 600 NaN
6 2021-02-21 06:00:00 00:00 700 NaN
7 2021-02-21 07:00:00 00:00 800 NaN
8 2021-02-21 08:00:00 00:00 900 NaN
9 2021-02-21 09:00:00 00:00 1000 NaN
10 2021-02-21 10:00:00 00:00 1100 NaN
11 2021-02-21 11:00:00 00:00 1200 NaN
12 2021-02-21 12:00:00 00:00 1300 NaN
13 2021-02-21 13:00:00 00:00 1400 NaN
14 2021-02-21 14:00:00 00:00 1500 NaN
15 2021-02-21 15:00:00 00:00 1600 NaN
16 2021-02-21 16:00:00 00:00 1700 NaN
17 2021-02-21 17:00:00 00:00 1800 NaN
18 2021-02-21 18:00:00 00:00 1900 NaN
19 2021-02-21 19:00:00 00:00 2000 NaN
20 2021-02-21 20:00:00 00:00 2100 NaN
21 2021-02-21 21:00:00 00:00 2200 NaN
22 2021-02-21 22:00:00 00:00 2300 NaN
23 2021-02-21 23:00:00 00:00 2400 NaN
24 2021-02-22 00:00:00 00:00 200 150.0
25 2021-02-22 01:00:00 00:00 400 300.0
26 2021-02-22 02:00:00 00:00 600 450.0
27 2021-02-22 03:00:00 00:00 800 600.0
28 2021-02-22 04:00:00 00:00 1000 750.0
29 2021-02-22 05:00:00 00:00 1200 900.0
30 2021-02-22 06:00:00 00:00 1400 1050.0
31 2021-02-22 07:00:00 00:00 1600 1200.0
32 2021-02-22 08:00:00 00:00 1800 1350.0
33 2021-02-22 09:00:00 00:00 2000 1500.0
34 2021-02-22 10:00:00 00:00 2200 1650.0
35 2021-02-22 11:00:00 00:00 2400 1800.0
36 2021-02-22 12:00:00 00:00 2600 1950.0
37 2021-02-22 13:00:00 00:00 2800 2100.0
38 2021-02-22 14:00:00 00:00 3000 2250.0
39 2021-02-22 15:00:00 00:00 3200 2400.0
40 2021-02-22 16:00:00 00:00 3400 2550.0
41 2021-02-22 17:00:00 00:00 3600 2700.0
42 2021-02-22 18:00:00 00:00 3800 2850.0
43 2021-02-22 19:00:00 00:00 4000 3000.0
44 2021-02-22 20:00:00 00:00 4200 3150.0
45 2021-02-22 21:00:00 00:00 4400 3300.0
46 2021-02-22 22:00:00 00:00 4600 3450.0
47 2021-02-22 23:00:00 00:00 4800 3600.0