Home > database >  Pandas: number of occurrences in the last N hours for each row by row's criteria
Pandas: number of occurrences in the last N hours for each row by row's criteria

Time:12-14

Given this dataset:

ID timestamp           type
 1 2022-12-12 01:00:00 TypeA
 2 2022-12-12 05:00:00 TypeA
 3 2022-12-12 06:00:00 TypeA
 4 2022-12-12 07:00.00 TypeB
 5 2022-12-13 00:00:00 TypeA
 6 2022-12-13 02:00:00 TypeB
 7 2022-12-13 23:00:00 TypeA

I want, for each row, to count the number of rows of its same type in the last N hours. In the example, for N=24h, that would be:

ID timestamp           type  count
 1 2022-12-12 01:00:00 TypeA 0
 2 2022-12-12 05:00:00 TypeA 1
 3 2022-12-12 06:00:00 TypeA 2
 4 2022-12-12 07:00.00 TypeB 0
 5 2022-12-13 00:00:00 TypeA 3
 6 2022-12-13 02:00:00 TypeB 1
 7 2022-12-13 23:00:00 TypeA 1

CodePudding user response:

I am not sure if this is what you want?

In [96]: df.set_index("datetime").groupby("type").rolling("24h").count()
Out[96]:
                            ID  timestamp
type  datetime
TypeA 2022-12-12 01:00:00  1.0        1.0
      2022-12-12 05:00:00  2.0        2.0
      2022-12-12 06:00:00  3.0        3.0
      2022-12-13 00:00:00  4.0        4.0
      2022-12-13 23:00:00  2.0        2.0
TypeB 2022-12-12 07:00:00  1.0        1.0
      2022-12-13 02:00:00  2.0        2.0

In [125]: df.set_index("datetime").groupby("type")['ID'].rolling("23h", min_periods=0, closed="left").agg({"count":"count"})
Out[125]:
                           count
type  datetime
TypeA 2022-12-12 01:00:00    0.0
      2022-12-12 05:00:00    1.0
      2022-12-12 06:00:00    2.0
      2022-12-13 00:00:00    3.0
      2022-12-13 23:00:00    1.0
TypeB 2022-12-12 07:00:00    0.0
      2022-12-13 02:00:00    1.0
  • Related