Add a column in a pandas dataframe based on if 100 values in another row are smaller than x-CodePudding

i have a df like this:

timestamp                    values
2021-01-01 00:00:00 00:00    0.125992
2021-01-01 00:15:00 00:00    0.119612
2021-01-01 00:30:00 00:00    0.099005
2021-01-01 00:45:00 00:00    0.094307
2021-01-01 01:00:00 00:00    0.098852
  
2021-12-31 22:45:00 00:00    0.072708
2021-12-31 23:00:00 00:00    0.080016
2021-12-31 23:15:00 00:00    0.075784
2021-12-31 23:30:00 00:00    0.079673
2021-12-31 23:45:00 00:00    0.065123

I would like to check if at some points the values are smaller than 0.15 for over 100 timestamps (in the excerpt here we only see night values which are low but during the day the values are more like 0.7~). So if over a period of 100 timestamps the values are successively smaller than 0.15 i would like to add a new column that is titled "holidays" that fills the said 100 rows with 1 and 0 otherwise. How could i do this with pandas?

Working solution, with test_df_ being the dataframe above:

VALUE_THRE = 0.15
STEPS_THRE = 100

below_thre = test_df_["values"].lt(VALUE_THRE) 
test_df_["is_not_at_home?"] = df.groupby(below_thre.diff().ne(0).cumsum())["values"].transform("size").where(below_thre).gt(STEPS_THRE).astype(int)

                             values   is_not_at_home?
Time                                                                     
2021-01-01 00:00:00 00:00  0.125992                 0
2021-01-01 00:15:00 00:00  0.119612                 0
2021-01-01 00:30:00 00:00  0.099005                 0
2021-01-01 00:45:00 00:00  0.094307                 0
2021-01-01 01:00:00 00:00  0.098852                 0
                            ...  ...  ...  ...    ...  ..              ...
2021-12-31 22:30:00 00:00  0.079667                 0
2021-12-31 22:45:00 00:00  0.072708                 0
2021-12-31 23:00:00 00:00  0.080016                 0
2021-12-31 23:15:00 00:00  0.075784                 0
2021-12-31 23:30:00 00:00  0.079673                 0

CodePudding user response：

I hope your dataframe index is sequential (0, 1, 2, ...). If not, reset it with df.reset_index() before running this code:

# For each row, check if the rolling 100 rows (including
# itself) are all less than 0.15
s = df.rolling(100)["values"].max() < 0.15

# For each of these rows, calculate the indices of the rolling window
idx = s[s].index.to_numpy()[:, None] - np.arange(100)
idx = np.unique(np.ravel(idx))

# Result
df["holiday"] = np.where(df.index.isin(idx), 1, 0)