Home > Net >  error when using time in rolling function pandas
error when using time in rolling function pandas

Time:11-09

I am trying to calculate mean i.e moving average every 10sec of data; lets say 1 to 10sec, and 11sec to 20sec etc.

Please suggest if below is right for this. I am getting error when using "60sec" in rolling function, I think it may be due to the "ltt" column which is of type string, I am converting it to datetime, but still the error is coming.

Can someone please resolve this error? Also how to do the averaging for samples collected every 10sec. This is streaming data coming in, but for testing purpose, I am using the static data in record1.

import pandas as pd
import numpy as np

records1 = [
{'ltt': 'Mon Nov  7 12:12:05 2022', 'last': 258},
{'ltt': 'Mon Nov  7 12:12:05 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:07 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:08 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:09 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:10 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:11 2022', 'last': 261},
{'ltt': 'Mon Nov  7 12:12:12 2022', 'last': 262},
{'ltt': 'Mon Nov  7 12:12:12 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:14 2022', 'last': 258},
{'ltt': 'Mon Nov  7 12:12:15 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:16 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:17 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:18 2022', 'last': 258},
{'ltt': 'Mon Nov  7 12:12:19 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:20 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:21 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:22 2022', 'last': 258},
{'ltt': 'Mon Nov  7 12:12:23 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:24 2022', 'last': 260}
]

datalist = []

def strategy1(record):
    global datalist

    datalist.append(record)
    pandas_df = pd.DataFrame(datalist)
    pandas_df['ltt'] = pd.to_datetime(pandas_df['ltt'], format="%a %b %d %H:%M:%S %Y")

    pandas_df['hour'] = pandas_df['ltt'].dt.hour
    pandas_df['minute'] = pandas_df['ltt'].dt.minute
    pandas_df['second'] = pandas_df['ltt'].dt.second

    pandas_df['max'] = pandas_df.groupby('second')['last'].transform("max")

    pandas_df["ma_1min"] = (
        pandas_df.sort_values("ltt")
            .groupby(["hour", "minute"])["last"]
            .transform(lambda x: x.rolling('10sec', min_periods=1).mean())
    )

    print(pandas_df)

CodePudding user response:

i don't know how to exactly implement this in your code but i had a kind of similar problem where i had to group each day into 4 hour timeslots. so an approach might be something like this:

pandas_df.groupby([pandas_df['ltt'].dt.hour, pandas_df['ltt'].dt.minute, (pandas_df['ltt'].dt.second / 10).astype(int)]).last.agg('mean')

this should basically give you 6 groups ([0s-9s -> 0], [10s-19s -> 1], etc. for the 3rd groupby index) for every minute of data.

  • Related