Home > Blockchain >  Begin a rolling sum based on an observation in another column
Begin a rolling sum based on an observation in another column

Time:09-02

I have the following:

data = [100,100,100,100,10,100,10,100]
df = pd.DataFrame({'a':data, 'b':data})
df

    a   b
0   100 100
1   100 100
2   100 100
3   100 100
4   10  10
5   100 100
6   10  10
7   100 100

df['cond_a'] = (df.a == 100).rolling(3).sum() == 3
df

    a   b   cond_a
0   100 100 False
1   100 100 False
2   100 100 True
3   100 100 True
4   10  10  False
5   100 100 False
6   10  10  False
7   100 100 False

What i would like to do now is add another column (df['cond_b_count']) which counts the occurrences of 100 values AFTER the observation of 'cond_a == true' on a rolling window of 3. so what i would end up with would be something like:

    a   b   cond_a  cond_b_count
0   100 100 False   NaN
1   100 100 False   NaN
2   100 100 True    NaN
3   100 100 True    1.0
4   10  10  False   1.0
5   100 100 False   2.0
6   10  10  False   1.0
7   100 100 False   2.0

something like this, which is ofcourse incorrect, as it doesnt take into check first for cond_a being true before applying the rolling(3) sum.

df['cond-b'] = (df.b == 100).rolling(3).sum()

CodePudding user response:

Not sure if it's possible in one-line. Here's what I'd do:

  • find the first True value in cond_a, let's call it offset;
  • mask df['b'] with nan up to this offset;
  • find the sums in rolling windows on the masked column;
  • replace zeros with nan up to the offset in the result from the previous step.

The code below is mirroring the list above.

offset = df[df['cond_a']].index[0]
s = df['b'].where(df.index > offset, pd.NA)
s = (s == 100).rolling(3).sum()
df['cond_b'] = s.where(df.index > offset, pd.NA)
  • Related