i have a df like this:
timestamp values
2021-01-01 00:00:00 00:00 0.125992
2021-01-01 00:15:00 00:00 0.119612
2021-01-01 00:30:00 00:00 0.099005
2021-01-01 00:45:00 00:00 0.094307
2021-01-01 01:00:00 00:00 0.098852
2021-12-31 22:45:00 00:00 0.072708
2021-12-31 23:00:00 00:00 0.080016
2021-12-31 23:15:00 00:00 0.075784
2021-12-31 23:30:00 00:00 0.079673
2021-12-31 23:45:00 00:00 0.065123
I would like to check if at some points the values are smaller than 0.15 for over 100 timestamps (in the excerpt here we only see night values which are low but during the day the values are more like 0.7~). So if over a period of 100 timestamps the values are successively smaller than 0.15 i would like to add a new column that is titled "holidays" that fills the said 100 rows with 1 and 0 otherwise. How could i do this with pandas?
Working solution, with test_df_ being the dataframe above:
VALUE_THRE = 0.15
STEPS_THRE = 100
below_thre = test_df_["values"].lt(VALUE_THRE)
test_df_["is_not_at_home?"] = df.groupby(below_thre.diff().ne(0).cumsum())["values"].transform("size").where(below_thre).gt(STEPS_THRE).astype(int)
values is_not_at_home?
Time
2021-01-01 00:00:00 00:00 0.125992 0
2021-01-01 00:15:00 00:00 0.119612 0
2021-01-01 00:30:00 00:00 0.099005 0
2021-01-01 00:45:00 00:00 0.094307 0
2021-01-01 01:00:00 00:00 0.098852 0
... ... ... ... ... .. ...
2021-12-31 22:30:00 00:00 0.079667 0
2021-12-31 22:45:00 00:00 0.072708 0
2021-12-31 23:00:00 00:00 0.080016 0
2021-12-31 23:15:00 00:00 0.075784 0
2021-12-31 23:30:00 00:00 0.079673 0
CodePudding user response:
I hope your dataframe index is sequential (0, 1, 2, ...). If not, reset it with df.reset_index()
before running this code:
# For each row, check if the rolling 100 rows (including
# itself) are all less than 0.15
s = df.rolling(100)["values"].max() < 0.15
# For each of these rows, calculate the indices of the rolling window
idx = s[s].index.to_numpy()[:, None] - np.arange(100)
idx = np.unique(np.ravel(idx))
# Result
df["holiday"] = np.where(df.index.isin(idx), 1, 0)