I want to take a one-hour rolling average from a Dataframe in Python but the problem is the number of readings in each hour is not fixed and I cannot put a constant number in the rolling average function. Below is what my Dataframe looks like:
local_date_time visits
0 2022-04-29 17:03:25 468
2 2022-04-29 17:23:24 461
4 2022-04-29 17:43:24 458
5 2022-04-29 17:53:24 454
6 2022-04-29 18:03:25 464
11 2022-04-29 18:53:25 517
13 2022-04-29 19:13:24 534
14 2022-04-29 19:23:25 541
15 2022-04-29 19:33:24 522
20 2022-04-29 20:32:06 521
21 2022-04-29 20:42:06 537
27 2022-04-29 21:42:06 527
28 2022-04-29 21:52:06 519
29 2022-04-29 22:02:06 524
30 2022-04-29 22:12:05 514
31 2022-04-29 22:22:06 512
32 2022-04-29 22:32:06 525
33 2022-04-29 22:42:06 515
34 2022-04-29 23:55:06 518
As demonstrated, the number of past one-hour readings/measurements for each data point is varied between 1 and 6 readings/measurements. I want to write a code to have the following outputs.
local_date_time visits past_1hr_readings 1hr_rolling_average
0 2022-04-29 17:03:25 468 1 NAN
2 2022-04-29 17:23:24 461 2 NAN
4 2022-04-29 17:43:24 458 3 NAN
5 2022-04-29 17:53:24 454 4 460.25
6 2022-04-29 18:03:25 464 4 459.25
11 2022-04-29 18:53:25 517 2 490.50
13 2022-04-29 19:13:24 534 2 525.50
14 2022-04-29 19:23:25 541 3 530.67
15 2022-04-29 19:33:24 522 4 528.5
20 2022-04-29 20:32:06 521 2 521.5
21 2022-04-29 20:42:06 537 2 529.0
27 2022-04-29 21:42:06 527 1 527
28 2022-04-29 21:52:06 519 2 523
29 2022-04-29 22:02:06 524 3 523.33
30 2022-04-29 22:12:05 514 4 521
31 2022-04-29 22:22:06 512 5 519.2
32 2022-04-29 22:32:06 525 6 520.17
33 2022-04-29 22:42:06 515 6 518.17
34 2022-04-29 23:55:06 518 1 518
- My FIRST question is how can I automatically generate the "past_1hr_readings" column?
- My SECOND question is how can I use the data in the "past_1hr_readings" column in a rolling average function?
I wrote the following code for the SECOND question, but it did not work.
df['1hr_rolling_average'] = df.co2.rolling(df['past_1hr_readings']).mean()
This is the error that I receive.
raise ValueError("window must be an integer 0 or greater") ValueError: window must be an integer 0 or greater
I would really appreciate it if you could help me with this code. I am also open to any other approach instead of the approach that I explained.
CodePudding user response:
The 1hr_rolling_average
can be added with df['1hr_rolling_average'] = df.rolling('1h', min_periods=1).mean()
.
import sys
import pandas as pd
from io import StringIO
from datetime import datetime
data = StringIO("""local_date_time;visits
2022-04-29 17:03:25;468
2022-04-29 17:23:24;461
2022-04-29 17:43:24;458
2022-04-29 17:53:24;454
2022-04-29 18:03:25;464
2022-04-29 18:53:25;517
2022-04-29 19:13:24;534
2022-04-29 19:23:25;541
2022-04-29 19:33:24;522
2022-04-29 20:32:06;521
2022-04-29 20:42:06;537
2022-04-29 21:42:06;527
2022-04-29 21:52:06;519
2022-04-29 22:02:06;524
2022-04-29 22:12:05;514
2022-04-29 22:22:06;512
2022-04-29 22:32:06;525
2022-04-29 22:42:06;515
2022-04-29 23:55:06;518
""")
df = pd.read_csv(data, sep=";")
df.local_date_time = pd.to_datetime(df.local_date_time)
df.set_index('local_date_time', inplace=True)
df['1hr_rolling_average'] = df.rolling('1h', min_periods=1).mean()
print(df)
CodePudding user response:
You can use rolling
with an offset only on='local_date_time'
column:
# Ensure local_date_time is a valid DatetimeIndex
df['local_date_time'] = pd.to_datetime(df['local_date_time'])
df1 = (df.rolling('H', on='local_date_time')['visits']
.agg({'past_1hr_readings': 'count', '1hr_rolling_average': 'mean'}))
df = pd.concat([df, df1], axis=1).astype({'past_1hr_readings': int})
print(df)
# Output
local_date_time visits past_1hr_readings 1hr_rolling_average
0 2022-04-29 17:03:25 468 1 468.000000
2 2022-04-29 17:23:24 461 2 464.500000
4 2022-04-29 17:43:24 458 3 462.333333
5 2022-04-29 17:53:24 454 4 460.250000
6 2022-04-29 18:03:25 464 4 459.250000
11 2022-04-29 18:53:25 517 2 490.500000
13 2022-04-29 19:13:24 534 2 525.500000
14 2022-04-29 19:23:25 541 3 530.666667
15 2022-04-29 19:33:24 522 4 528.500000
20 2022-04-29 20:32:06 521 2 521.500000
21 2022-04-29 20:42:06 537 2 529.000000
27 2022-04-29 21:42:06 527 1 527.000000
28 2022-04-29 21:52:06 519 2 523.000000
29 2022-04-29 22:02:06 524 3 523.333333
30 2022-04-29 22:12:05 514 4 521.000000
31 2022-04-29 22:22:06 512 5 519.200000
32 2022-04-29 22:32:06 525 6 520.166667
33 2022-04-29 22:42:06 515 6 518.166667
34 2022-04-29 23:55:06 518 1 518.000000