Given this dataset:
ID timestamp type
1 2022-12-12 01:00:00 TypeA
2 2022-12-12 05:00:00 TypeA
3 2022-12-12 06:00:00 TypeA
4 2022-12-12 07:00.00 TypeB
5 2022-12-13 00:00:00 TypeA
6 2022-12-13 02:00:00 TypeB
7 2022-12-13 23:00:00 TypeA
I want, for each row, to count the number of rows of its same type in the last N hours. In the example, for N=24h, that would be:
ID timestamp type count
1 2022-12-12 01:00:00 TypeA 0
2 2022-12-12 05:00:00 TypeA 1
3 2022-12-12 06:00:00 TypeA 2
4 2022-12-12 07:00.00 TypeB 0
5 2022-12-13 00:00:00 TypeA 3
6 2022-12-13 02:00:00 TypeB 1
7 2022-12-13 23:00:00 TypeA 1
CodePudding user response:
I am not sure if this is what you want?
In [96]: df.set_index("datetime").groupby("type").rolling("24h").count()
Out[96]:
ID timestamp
type datetime
TypeA 2022-12-12 01:00:00 1.0 1.0
2022-12-12 05:00:00 2.0 2.0
2022-12-12 06:00:00 3.0 3.0
2022-12-13 00:00:00 4.0 4.0
2022-12-13 23:00:00 2.0 2.0
TypeB 2022-12-12 07:00:00 1.0 1.0
2022-12-13 02:00:00 2.0 2.0
In [125]: df.set_index("datetime").groupby("type")['ID'].rolling("23h", min_periods=0, closed="left").agg({"count":"count"})
Out[125]:
count
type datetime
TypeA 2022-12-12 01:00:00 0.0
2022-12-12 05:00:00 1.0
2022-12-12 06:00:00 2.0
2022-12-13 00:00:00 3.0
2022-12-13 23:00:00 1.0
TypeB 2022-12-12 07:00:00 0.0
2022-12-13 02:00:00 1.0