I have a pandas data-frame that looks like the following
Day val
Day1 0
Day2 0
Day3 0
Day4 0
Day5 1
Day6 1
Day7 1
Day8 1
Day9 0
Day10 0
Day11 0
Day12 1
Day13 1
Day14 1
Day15 1
Day16 0
Day17 0
Day18 0
Day19 0
Day20 0
Day21 1
Day22 0
Day23 1
Day24 1
Day25 1
I am looking to extract at-most 2 rows where val = 0
but only those where the proceeding were a set of 1's.
Example
: There is a set of ones from Day5 to Day8
(i call them as an event
). I would need to look into at-most two rows after the end of event.So here it's Day9 & Day10
.
Similarly , Day21
Can be called as a single day event , and i need to look into only Day22
since its a zero that follows an event.
For the table data above , the output would be the following
Day val
day9 0
Day10 0
Day16 0
Day17 0
Day22 0
CodePudding user response:
We can simplify the condition to 1) val
should be equal to 0 and 2) the previous day or the day before that should be 1.
In code:
cond = (df['val'].shift(1) == 1) | (df['val'].shift(2) == 1)
df.loc[(df['val'] == 0) & cond]
Result:
Day val
8 Day9 0
9 Day10 0
15 Day16 0
16 Day17 0
21 Day22 0
Note: If more than 2 days should be considered this can easily be added to the condition cond
. In this case, cond
can be constructed with a list comprehension and any()
.
CodePudding user response:
You can compute a mask on the rolling max per group where the groups start for each 1->0 transition and combine it with a second mask where the values are 0:
N=2
o2z = df['val'].diff().eq(-1)
m1 = o2z.groupby(o2z.cumsum()).rolling(N, min_periods=1).max().astype(bool).values
m2 = df['val'].eq(0)
df[m1&m2]
output:
Day val
8 Day9 0
9 Day10 0
15 Day16 0
16 Day17 0
21 Day22 0