Is there a way to use groupby with a time threshold? Lets assume a 5min time threshold. For example, groupby times that are within 1min of eachother. The datetimes will be monitonically increase, but may span across multiple days.
Cents | ED_YF | Vals | EventTimestamp | |
---|---|---|---|---|
10 | 182.5 | 3 | 323.473 | 2022-06-27 13:41:01 |
11 | 182.5 | 3 | 323.473 | 2022-06-27 13:41:01 |
12 | 153.5 | 2 | 384.767 | 2022-06-28 14:36:15 |
13 | 153.5 | 2 | 384.767 | 2022-06-28 14:37:11 |
14 | 43 | 2 | 107.785 | 2022-06-28 16:40:37 |
15 | 43 | 2 | 107.785 | 2022-06-28 16:40:37 |
4 | 152 | 2 | 381.007 | 2022-06-29 16:41:00 |
5 | 152 | 2 | 381.007 | 2022-06-29 16:41:00 |
I've seen it used in a merge_asof, where a threshold is a provided input, but I havent seen anything similar in the Panda docs for a GroupBy... In this example, I'm looking to add a column ["GroupID"] (.ngroup()) based on the column EventTimestamp:
Cents | ED_YF | Vals | EventTimestamp | GroupID | |
---|---|---|---|---|---|
10 | 182.5 | 3 | 323.473 | 2022-06-27 13:41:01 | 0 |
11 | 182.5 | 3 | 323.473 | 2022-06-27 13:41:01 | 0 |
12 | 153.5 | 2 | 384.767 | 2022-06-28 14:36:15 | 1 |
13 | 153.5 | 2 | 384.767 | 2022-06-28 14:37:11 | 1 |
14 | 43 | 2 | 107.785 | 2022-06-28 16:40:37 | 2 |
15 | 43 | 2 | 107.785 | 2022-06-28 16:40:37 | 2 |
4 | 152 | 2 | 381.007 | 2022-06-29 16:41:00 | 3 |
5 | 152 | 2 | 381.007 | 2022-06-29 16:41:00 | 3 |
Thanks!
EDIT: I have tried using df['GroupID'] = (df['EventTimestamp'].diff() >= pd.Timedelta('1T')).cumsum(), but this doesn't work across multiple days.
CodePudding user response:
IUC, i think what you are searching for is the pd.Grouper instruction through it you can define, the freq beetween the datetimes you want to groupby
df['GroupId'] = df[['EventTimeStamp']].groupby(pd.Grouper(freq='1min')).cumcount()
I didnt test it out but i am pretty sure this what you looking for