Home > OS >  How to take one-hour rolling average from a Dataframe in Python with flexible number of readings in
How to take one-hour rolling average from a Dataframe in Python with flexible number of readings in

Time:07-02

I want to take a one-hour rolling average from a Dataframe in Python but the problem is the number of readings in each hour is not fixed and I cannot put a constant number in the rolling average function. Below is what my Dataframe looks like:

    local_date_time        visits
0   2022-04-29 17:03:25     468
2   2022-04-29 17:23:24     461
4   2022-04-29 17:43:24     458
5   2022-04-29 17:53:24     454
6   2022-04-29 18:03:25     464
11  2022-04-29 18:53:25     517
13  2022-04-29 19:13:24     534
14  2022-04-29 19:23:25     541
15  2022-04-29 19:33:24     522
20  2022-04-29 20:32:06     521
21  2022-04-29 20:42:06     537
27  2022-04-29 21:42:06     527
28  2022-04-29 21:52:06     519
29  2022-04-29 22:02:06     524
30  2022-04-29 22:12:05     514
31  2022-04-29 22:22:06     512
32  2022-04-29 22:32:06     525
33  2022-04-29 22:42:06     515
34  2022-04-29 23:55:06     518

As demonstrated, the number of past one-hour readings/measurements for each data point is varied between 1 and 6 readings/measurements. I want to write a code to have the following outputs.

    local_date_time     visits      past_1hr_readings       1hr_rolling_average
0   2022-04-29 17:03:25     468       1                          NAN
2   2022-04-29 17:23:24     461       2                          NAN
4   2022-04-29 17:43:24     458       3                          NAN
5   2022-04-29 17:53:24     454       4                         460.25
6   2022-04-29 18:03:25     464       4                         459.25
11  2022-04-29 18:53:25     517       2                         490.50
13  2022-04-29 19:13:24     534       2                         525.50
14  2022-04-29 19:23:25     541       3                         530.67
15  2022-04-29 19:33:24     522       4                         528.5
20  2022-04-29 20:32:06     521       2                         521.5
21  2022-04-29 20:42:06     537       2                         529.0
27  2022-04-29 21:42:06     527       1                         527
28  2022-04-29 21:52:06     519       2                         523
29  2022-04-29 22:02:06     524       3                         523.33
30  2022-04-29 22:12:05     514       4                         521
31  2022-04-29 22:22:06     512       5                         519.2
32  2022-04-29 22:32:06     525       6                         520.17
33  2022-04-29 22:42:06     515       6                         518.17
34  2022-04-29 23:55:06     518       1                         518
  1. My FIRST question is how can I automatically generate the "past_1hr_readings" column?
  2. My SECOND question is how can I use the data in the "past_1hr_readings" column in a rolling average function?

I wrote the following code for the SECOND question, but it did not work.

df['1hr_rolling_average'] = df.co2.rolling(df['past_1hr_readings']).mean()

This is the error that I receive.

raise ValueError("window must be an integer 0 or greater") ValueError: window must be an integer 0 or greater

I would really appreciate it if you could help me with this code. I am also open to any other approach instead of the approach that I explained.

CodePudding user response:

The 1hr_rolling_average can be added with df['1hr_rolling_average'] = df.rolling('1h', min_periods=1).mean().

import sys
import pandas as pd
from io import StringIO
from datetime import datetime

data = StringIO("""local_date_time;visits
2022-04-29 17:03:25;468
2022-04-29 17:23:24;461
2022-04-29 17:43:24;458
2022-04-29 17:53:24;454
2022-04-29 18:03:25;464
2022-04-29 18:53:25;517
2022-04-29 19:13:24;534
2022-04-29 19:23:25;541
2022-04-29 19:33:24;522
2022-04-29 20:32:06;521
2022-04-29 20:42:06;537
2022-04-29 21:42:06;527
2022-04-29 21:52:06;519
2022-04-29 22:02:06;524
2022-04-29 22:12:05;514
2022-04-29 22:22:06;512
2022-04-29 22:32:06;525
2022-04-29 22:42:06;515
2022-04-29 23:55:06;518
""")

df = pd.read_csv(data, sep=";")
df.local_date_time = pd.to_datetime(df.local_date_time)
df.set_index('local_date_time', inplace=True)
df['1hr_rolling_average'] = df.rolling('1h', min_periods=1).mean()
print(df)

enter image description here

CodePudding user response:

You can use rolling with an offset only on='local_date_time' column:

# Ensure local_date_time is a valid DatetimeIndex
df['local_date_time'] = pd.to_datetime(df['local_date_time'])

df1 = (df.rolling('H', on='local_date_time')['visits']
         .agg({'past_1hr_readings': 'count', '1hr_rolling_average': 'mean'}))
df = pd.concat([df, df1], axis=1).astype({'past_1hr_readings': int})
print(df)

# Output
       local_date_time  visits  past_1hr_readings  1hr_rolling_average
0  2022-04-29 17:03:25     468                  1           468.000000
2  2022-04-29 17:23:24     461                  2           464.500000
4  2022-04-29 17:43:24     458                  3           462.333333
5  2022-04-29 17:53:24     454                  4           460.250000
6  2022-04-29 18:03:25     464                  4           459.250000
11 2022-04-29 18:53:25     517                  2           490.500000
13 2022-04-29 19:13:24     534                  2           525.500000
14 2022-04-29 19:23:25     541                  3           530.666667
15 2022-04-29 19:33:24     522                  4           528.500000
20 2022-04-29 20:32:06     521                  2           521.500000
21 2022-04-29 20:42:06     537                  2           529.000000
27 2022-04-29 21:42:06     527                  1           527.000000
28 2022-04-29 21:52:06     519                  2           523.000000
29 2022-04-29 22:02:06     524                  3           523.333333
30 2022-04-29 22:12:05     514                  4           521.000000
31 2022-04-29 22:22:06     512                  5           519.200000
32 2022-04-29 22:32:06     525                  6           520.166667
33 2022-04-29 22:42:06     515                  6           518.166667
34 2022-04-29 23:55:06     518                  1           518.000000
  • Related