I want to add two columns in my dataframe with the following function using python:
- "counter1": it counts the number of the ones in the column "case1-0". (cumulative sum of the ones)
- "counter2": it counts the number of the zeros in the column "case1-0" but only if the last previous value in "counter1" is greater than 3.
case1-0 | counter1 | counter2 |
---|---|---|
1 | 1 | 0 |
1 | 2 | 0 |
1 | 3 | 0 |
1 | 4 | 0 |
0 | 0 | 1 |
0 | 0 | 2 |
0 | 0 | 3 |
1 | 1 | 0 |
1 | 2 | 0 |
0 | 0 | 0 |
0 | 0 | 0 |
1 | 1 | 0 |
In the reality I have a time serie. Therefore the approach should be applicable for greater data frames.
CodePudding user response:
Use this solution
with a
mask for compare by 1
for counter11
and for counter22
is replaced not 1
values of column counter11
to missing values and forward filling them, so possible compare for greater values like 3
and pass to numpy.where
values of s
helper Series
:
a = df['case1-0'].eq(1)
b = a.cumsum()
df['counter11'] = b.sub(b.mask(a).ffill().fillna(0)).astype(int)
a1 = ~a
b = a1.cumsum()
s = b.sub(b.mask(a1).ffill().fillna(0)).astype(int)
df['counter22'] = np.where(df['counter11'].where(a).ffill().gt(3) & a1, s, 0)
print (df)
case1-0 counter1 counter2 counter11 counter22
0 1 1 0 1 0
1 1 2 0 2 0
2 1 3 0 3 0
3 1 4 0 4 0
4 0 0 1 0 1
5 0 0 2 0 2
6 0 0 3 0 3
7 1 1 0 1 0
8 1 2 0 2 0
9 0 0 0 0 0
10 0 0 0 0 0
11 1 1 0 1 0
Explanation how it working:
print (df.assign(counter0 = s,
replaced=df['counter11'].where(a),
ffill= df['counter11'].where(a).ffill(),
mask = df['counter11'].where(a).ffill().gt(3),
chained = df['counter11'].where(a).ffill().gt(3) & a1,
counter22 = np.where(df['counter11'].where(a).ffill().gt(3) & a1, s, 0)))
case1-0 counter1 counter2 counter11 counter0 replaced ffill mask \
0 1 1 0 1 0 1.0 1.0 False
1 1 2 0 2 0 2.0 2.0 False
2 1 3 0 3 0 3.0 3.0 False
3 1 4 0 4 0 4.0 4.0 True
4 0 0 1 0 1 NaN 4.0 True
5 0 0 2 0 2 NaN 4.0 True
6 0 0 3 0 3 NaN 4.0 True
7 1 1 0 1 0 1.0 1.0 False
8 1 2 0 2 0 2.0 2.0 False
9 0 0 0 0 1 NaN 2.0 False
10 0 0 0 0 2 NaN 2.0 False
11 1 1 0 1 0 1.0 1.0 False
chained counter22
0 False 0
1 False 0
2 False 0
3 False 0
4 True 1
5 True 2
6 True 3
7 False 0
8 False 0
9 False 0
10 False 0
11 False 0
Alternative solution
counter
is for count consecutive 0
and 1
values by compare shifted values for groups with GroupBy.cumcount
, then for counter11
set 0 if not match condition mask
in Series.where
and for counter22
is used same solution like above (only changed variables):
mask = df['case1-0'].eq(1)
counter = df.groupby(df['case1-0'].ne(df['case1-0'].shift()).cumsum()).cumcount().add(1)
df['counter11'] = counter.where(mask, 0)
df['counter22'] = np.where(df['counter11'].where(mask).ffill().gt(3) & ~mask, counter, 0)
print (df)
case1-0 counter1 counter2 counter11 counter22
0 1 1 0 1 0
1 1 2 0 2 0
2 1 3 0 3 0
3 1 4 0 4 0
4 0 0 1 0 1
5 0 0 2 0 2
6 0 0 3 0 3
7 1 1 0 1 0
8 1 2 0 2 0
9 0 0 0 0 0
10 0 0 0 0 0
11 1 1 0 1 0