Home > database >  How to find consecutive zeros in time series
How to find consecutive zeros in time series

Time:08-24

I have a data frame that its index is hourly date and its column is counts. Looks like the following table :

 date                       counts
2017-03-31 00:00:00 00:00   0.0
2017-03-31 01:00:00 00:00   0.0
2017-03-31 02:00:00 00:00   0.0
2017-03-31 03:00:00 00:00   0.0
2017-03-31 04:00:00 00:00   0.0
... ...
2022-06-19 19:00:00 00:00   6.0
2022-06-19 20:00:00 00:00   6.0
2022-06-19 21:00:00 00:00   1.0
2022-06-19 22:00:00 00:00   1.0
2022-06-19 23:00:00 00:00   1.0

If there are 15 hours worth of zero counts in a row, they are considered as error and I want to flag them. Data frame is not complete and there are missing dates(gaps) in the data.

I tried to use resampling the data frame to 15 hours and find dates with sum of resampled 15 hours are zero but didn't give me the correct answer

CodePudding user response:

If counts is guaranteed to be non-negative, you can use rolling and check for the max value:

df["is_error"] = df["counts"].rolling(15).max() == 0

If counts can be negative, you have to check both min and max:

r = df["counts"].rolling(15)
df["is_error"] = r.min().eq(0) & r.max().eq(0) 

CodePudding user response:

Assuming the dates are sorted, group by successive 0 and get the group size, if ≥ 15 flag it True:

m = df['counts'].ne(0)
c = df.groupby(m.cumsum())['counts'].transform('size')

df['error'] = c.gt(15).mask(m, False)
  • Related