I have a data frame that its index is hourly date and its column is counts. Looks like the following table :
date counts
2017-03-31 00:00:00 00:00 0.0
2017-03-31 01:00:00 00:00 0.0
2017-03-31 02:00:00 00:00 0.0
2017-03-31 03:00:00 00:00 0.0
2017-03-31 04:00:00 00:00 0.0
... ...
2022-06-19 19:00:00 00:00 6.0
2022-06-19 20:00:00 00:00 6.0
2022-06-19 21:00:00 00:00 1.0
2022-06-19 22:00:00 00:00 1.0
2022-06-19 23:00:00 00:00 1.0
If there are 15 hours worth of zero counts in a row, they are considered as error and I want to flag them. Data frame is not complete and there are missing dates(gaps) in the data.
I tried to use resampling the data frame to 15 hours and find dates with sum of resampled 15 hours are zero but didn't give me the correct answer
CodePudding user response:
If counts
is guaranteed to be non-negative, you can use rolling
and check for the max value:
df["is_error"] = df["counts"].rolling(15).max() == 0
If counts
can be negative, you have to check both min and max:
r = df["counts"].rolling(15)
df["is_error"] = r.min().eq(0) & r.max().eq(0)
CodePudding user response:
Assuming the dates are sorted, group by successive 0 and get the group size, if ≥ 15 flag it True:
m = df['counts'].ne(0)
c = df.groupby(m.cumsum())['counts'].transform('size')
df['error'] = c.gt(15).mask(m, False)