Home > Enterprise >  Pandas dataframe: count consecutive True / False values
Pandas dataframe: count consecutive True / False values

Time:10-25

I have a boolean True/False-column "Mask" in a dataframe, e.g.:

Mask
True
True
True
False
False
True
False
False

Now I am trying to add a column with the count of the consecutive True/False lines, where True is a positive sum (counts of 1) and False is a negative sum (counts of -1), e.g.

Mask     Count
True     3
True     3
True     3
False   -2
False   -2
True     1
False   -2
False   -2

I tried it with groupby and sum but now I got a knot in my head.

Tried something like

mask.groupby((~mask).cumsum()).cumsum().astype(int)

(mask is the condition for the True/False) but this only counts the Trues and does a count instead of showing the sum.

Would really appreciate any suggestions!

CodePudding user response:

You can get the group number of consecutive True/False by .cumsum() and put into g.

Then, group by g and get the size/count of each group by .transform() .size(). Set the sign by multiplying the return value (1 or -1) of np.where(), as follows:

g = df['Mask'].ne(df['Mask'].shift()).cumsum()

df['Count'] = df.groupby(g)['Mask'].transform('size') * np.where(df['Mask'], 1, -1)

Result:

print(df)

    Mask  Count
0   True      3
1   True      3
2   True      3
3  False     -2
4  False     -2
5   True      1
6  False     -2
7  False     -2
  • Related