Pandas GroupBy DateTime with time threshold across days-CodePudding

Is there a way to use groupby with a time threshold? Lets assume a 5min time threshold. For example, groupby times that are within 1min of eachother. The datetimes will be monitonically increase, but may span across multiple days.

	Cents	ED_YF	Vals	EventTimestamp
10	182.5	3	323.473	2022-06-27 13:41:01
11	182.5	3	323.473	2022-06-27 13:41:01
12	153.5	2	384.767	2022-06-28 14:36:15
13	153.5	2	384.767	2022-06-28 14:37:11
14	43	2	107.785	2022-06-28 16:40:37
15	43	2	107.785	2022-06-28 16:40:37
4	152	2	381.007	2022-06-29 16:41:00
5	152	2	381.007	2022-06-29 16:41:00

I've seen it used in a merge_asof, where a threshold is a provided input, but I havent seen anything similar in the Panda docs for a GroupBy... In this example, I'm looking to add a column ["GroupID"] (.ngroup()) based on the column EventTimestamp:

	Cents	ED_YF	Vals	EventTimestamp	GroupID
10	182.5	3	323.473	2022-06-27 13:41:01	0
11	182.5	3	323.473	2022-06-27 13:41:01	0
12	153.5	2	384.767	2022-06-28 14:36:15	1
13	153.5	2	384.767	2022-06-28 14:37:11	1
14	43	2	107.785	2022-06-28 16:40:37	2
15	43	2	107.785	2022-06-28 16:40:37	2
4	152	2	381.007	2022-06-29 16:41:00	3
5	152	2	381.007	2022-06-29 16:41:00	3

Thanks!

EDIT: I have tried using df['GroupID'] = (df['EventTimestamp'].diff() >= pd.Timedelta('1T')).cumsum(), but this doesn't work across multiple days.

CodePudding user response：

IUC, i think what you are searching for is the pd.Grouper instruction through it you can define, the freq beetween the datetimes you want to groupby

df['GroupId'] = df[['EventTimeStamp']].groupby(pd.Grouper(freq='1min')).cumcount()

I didnt test it out but i am pretty sure this what you looking for