I have the following:
data = [100,100,100,100,10,100,10,100]
df = pd.DataFrame({'a':data, 'b':data})
df
a b
0 100 100
1 100 100
2 100 100
3 100 100
4 10 10
5 100 100
6 10 10
7 100 100
df['cond_a'] = (df.a == 100).rolling(3).sum() == 3
df
a b cond_a
0 100 100 False
1 100 100 False
2 100 100 True
3 100 100 True
4 10 10 False
5 100 100 False
6 10 10 False
7 100 100 False
What i would like to do now is add another column (df['cond_b_count']) which counts the occurrences of 100 values AFTER the observation of 'cond_a == true' on a rolling window of 3. so what i would end up with would be something like:
a b cond_a cond_b_count
0 100 100 False NaN
1 100 100 False NaN
2 100 100 True NaN
3 100 100 True 1.0
4 10 10 False 1.0
5 100 100 False 2.0
6 10 10 False 1.0
7 100 100 False 2.0
something like this, which is ofcourse incorrect, as it doesnt take into check first for cond_a being true before applying the rolling(3) sum.
df['cond-b'] = (df.b == 100).rolling(3).sum()
CodePudding user response:
Not sure if it's possible in one-line. Here's what I'd do:
- find the first
True
value incond_a
, let's call itoffset
; - mask
df['b']
withnan
up to thisoffset
; - find the sums in rolling windows on the masked column;
- replace zeros with
nan
up to theoffset
in the result from the previous step.
The code below is mirroring the list above.
offset = df[df['cond_a']].index[0]
s = df['b'].where(df.index > offset, pd.NA)
s = (s == 100).rolling(3).sum()
df['cond_b'] = s.where(df.index > offset, pd.NA)