Home > other >  Pandas GroupBy DateTime with time threshold across days
Pandas GroupBy DateTime with time threshold across days

Time:07-08

Is there a way to use groupby with a time threshold? Lets assume a 5min time threshold. For example, groupby times that are within 1min of eachother. The datetimes will be monitonically increase, but may span across multiple days.

Cents ED_YF Vals EventTimestamp
10 182.5 3 323.473 2022-06-27 13:41:01
11 182.5 3 323.473 2022-06-27 13:41:01
12 153.5 2 384.767 2022-06-28 14:36:15
13 153.5 2 384.767 2022-06-28 14:37:11
14 43 2 107.785 2022-06-28 16:40:37
15 43 2 107.785 2022-06-28 16:40:37
4 152 2 381.007 2022-06-29 16:41:00
5 152 2 381.007 2022-06-29 16:41:00

I've seen it used in a merge_asof, where a threshold is a provided input, but I havent seen anything similar in the Panda docs for a GroupBy... In this example, I'm looking to add a column ["GroupID"] (.ngroup()) based on the column EventTimestamp:

Cents ED_YF Vals EventTimestamp GroupID
10 182.5 3 323.473 2022-06-27 13:41:01 0
11 182.5 3 323.473 2022-06-27 13:41:01 0
12 153.5 2 384.767 2022-06-28 14:36:15 1
13 153.5 2 384.767 2022-06-28 14:37:11 1
14 43 2 107.785 2022-06-28 16:40:37 2
15 43 2 107.785 2022-06-28 16:40:37 2
4 152 2 381.007 2022-06-29 16:41:00 3
5 152 2 381.007 2022-06-29 16:41:00 3

Thanks!

EDIT: I have tried using df['GroupID'] = (df['EventTimestamp'].diff() >= pd.Timedelta('1T')).cumsum(), but this doesn't work across multiple days.

CodePudding user response:

IUC, i think what you are searching for is the pd.Grouper instruction through it you can define, the freq beetween the datetimes you want to groupby

df['GroupId'] = df[['EventTimeStamp']].groupby(pd.Grouper(freq='1min')).cumcount()

I didnt test it out but i am pretty sure this what you looking for

  • Related