Extract a window of Rows following a set of one's in a pandas dataframe-CodePudding

I have a pandas data-frame that looks like the following

Day     val
Day1    0
Day2    0
Day3    0
Day4    0
Day5    1
Day6    1
Day7    1
Day8    1
Day9    0
Day10   0
Day11   0
Day12   1
Day13   1
Day14   1
Day15   1
Day16   0
Day17   0
Day18   0
Day19   0
Day20   0
Day21   1
Day22   0
Day23   1
Day24   1
Day25   1

I am looking to extract at-most 2 rows where val = 0 but only those where the proceeding were a set of 1's.

Example: There is a set of ones from Day5 to Day8(i call them as an event ). I would need to look into at-most two rows after the end of event.So here it's Day9 & Day10.

Similarly , Day21 Can be called as a single day event , and i need to look into only Day22 since its a zero that follows an event.

For the table data above , the output would be the following

Day     val    
day9    0
Day10   0
Day16   0
Day17   0
Day22   0

CodePudding user response：

We can simplify the condition to 1) val should be equal to 0 and 2) the previous day or the day before that should be 1.

In code:

cond = (df['val'].shift(1) == 1) | (df['val'].shift(2) == 1)
df.loc[(df['val'] == 0) & cond]

Result:

      Day  val
8    Day9    0
9   Day10    0
15  Day16    0
16  Day17    0
21  Day22    0

Note: If more than 2 days should be considered this can easily be added to the condition cond. In this case, cond can be constructed with a list comprehension and any().

CodePudding user response：

You can compute a mask on the rolling max per group where the groups start for each 1->0 transition and combine it with a second mask where the values are 0:

N=2
o2z = df['val'].diff().eq(-1)
m1 = o2z.groupby(o2z.cumsum()).rolling(N, min_periods=1).max().astype(bool).values
m2 = df['val'].eq(0)
df[m1&m2]

output:

      Day  val
8    Day9    0
9   Day10    0
15  Day16    0
16  Day17    0
21  Day22    0