Python Pandas Dataframe: conditional counters of one and zeros values-CodePudding

I want to add two columns in my dataframe with the following function using python:

"counter1": it counts the number of the ones in the column "case1-0". (cumulative sum of the ones)
"counter2": it counts the number of the zeros in the column "case1-0" but only if the last previous value in "counter1" is greater than 3.

case1-0	counter1	counter2
1	1	0
1	2	0
1	3	0
1	4	0
0	0	1
0	0	2
0	0	3
1	1	0
1	2	0
0	0	0
0	0	0
1	1	0

In the reality I have a time serie. Therefore the approach should be applicable for greater data frames.

CodePudding user response：

Use this solution with a mask for compare by 1 for counter11 and for counter22 is replaced not 1 values of column counter11 to missing values and forward filling them, so possible compare for greater values like 3 and pass to numpy.where values of s helper Series:

a = df['case1-0'].eq(1)
b = a.cumsum()
df['counter11'] = b.sub(b.mask(a).ffill().fillna(0)).astype(int)

a1 = ~a
b = a1.cumsum()
s = b.sub(b.mask(a1).ffill().fillna(0)).astype(int)

df['counter22'] = np.where(df['counter11'].where(a).ffill().gt(3) & a1, s, 0)
print (df)
    case1-0  counter1  counter2  counter11  counter22
0         1         1         0          1          0
1         1         2         0          2          0
2         1         3         0          3          0
3         1         4         0          4          0
4         0         0         1          0          1
5         0         0         2          0          2
6         0         0         3          0          3
7         1         1         0          1          0
8         1         2         0          2          0
9         0         0         0          0          0
10        0         0         0          0          0
11        1         1         0          1          0

Explanation how it working:

print (df.assign(counter0 = s,
                 replaced=df['counter11'].where(a),
                 ffill= df['counter11'].where(a).ffill(),
                 mask = df['counter11'].where(a).ffill().gt(3),
                 chained = df['counter11'].where(a).ffill().gt(3) & a1,
                 counter22 = np.where(df['counter11'].where(a).ffill().gt(3) & a1, s, 0)))

    case1-0  counter1  counter2  counter11  counter0  replaced  ffill   mask  \
0         1         1         0          1         0       1.0    1.0  False   
1         1         2         0          2         0       2.0    2.0  False   
2         1         3         0          3         0       3.0    3.0  False   
3         1         4         0          4         0       4.0    4.0   True   
4         0         0         1          0         1       NaN    4.0   True   
5         0         0         2          0         2       NaN    4.0   True   
6         0         0         3          0         3       NaN    4.0   True   
7         1         1         0          1         0       1.0    1.0  False   
8         1         2         0          2         0       2.0    2.0  False   
9         0         0         0          0         1       NaN    2.0  False   
10        0         0         0          0         2       NaN    2.0  False   
11        1         1         0          1         0       1.0    1.0  False   

    chained  counter22  
0     False          0  
1     False          0  
2     False          0  
3     False          0  
4      True          1  
5      True          2  
6      True          3  
7     False          0  
8     False          0  
9     False          0  
10    False          0  
11    False          0

Alternative solution

counter is for count consecutive 0 and 1 values by compare shifted values for groups with GroupBy.cumcount, then for counter11 set 0 if not match condition mask in Series.where and for counter22 is used same solution like above (only changed variables):

mask = df['case1-0'].eq(1)
counter = df.groupby(df['case1-0'].ne(df['case1-0'].shift()).cumsum()).cumcount().add(1)
df['counter11'] = counter.where(mask, 0)
df['counter22'] = np.where(df['counter11'].where(mask).ffill().gt(3) & ~mask, counter, 0)

print (df)
    case1-0  counter1  counter2  counter11  counter22
0         1         1         0          1          0
1         1         2         0          2          0
2         1         3         0          3          0
3         1         4         0          4          0
4         0         0         1          0          1
5         0         0         2          0          2
6         0         0         3          0          3
7         1         1         0          1          0
8         1         2         0          2          0
9         0         0         0          0          0
10        0         0         0          0          0
11        1         1         0          1          0