I have a boolean True
/False
-column "Mask
" in a dataframe, e.g.:
Mask
True
True
True
False
False
True
False
False
Now I am trying to add a column with the count of the consecutive True
/False
lines, where True
is a positive sum (counts of 1) and False
is a negative sum (counts of -1), e.g.
Mask Count
True 3
True 3
True 3
False -2
False -2
True 1
False -2
False -2
I tried it with groupby
and sum
but now I got a knot in my head.
Tried something like
mask.groupby((~mask).cumsum()).cumsum().astype(int)
(mask
is the condition for the True
/False
) but this only counts the Trues and does a count instead of showing the sum.
Would really appreciate any suggestions!
CodePudding user response:
You can get the group number of consecutive True
/False
by .cumsum()
and put into g
.
Then, group by g
and get the size/count of each group by .transform()
.size()
. Set the sign by multiplying the return value (1
or -1
) of np.where()
, as follows:
g = df['Mask'].ne(df['Mask'].shift()).cumsum()
df['Count'] = df.groupby(g)['Mask'].transform('size') * np.where(df['Mask'], 1, -1)
Result:
print(df)
Mask Count
0 True 3
1 True 3
2 True 3
3 False -2
4 False -2
5 True 1
6 False -2
7 False -2