Count cumulative true Value-CodePudding

I have the following column with True and False boolean values. I want create a new column performing a cumulative sum on the True values and if the value is False reset the count, like this:

   bool    count
0   False  0 
1   True   1   
2   True   2
3   True   3
4   False  0
5   True   1   
6   True   2   
7   False  0   
8   False  0

CodePudding user response：

Yes, this can be done, using a series of steps:

df['count'] = df.groupby(df['bool'].astype(int).diff().ne(0).cumsum())['bool'].cumsum()

Output:

>>> df

    bool  count
0  False      0
1   True      1
2   True      2
3   True      3
4  False      0
5   True      1
6   True      2
7  False      0
8  False      0

Explanation:

This code creates separate groups for all consecutive true values (1's) coming before a false value (0), then, treating the trues as 1's and the falses as 0's, computes the cumulative sum for each group, then concatenates the results together.

df.groupby -
1. df['bool'].astype(int) - Takes each value of bool, converts it to an int (true -> 1, false -> 0),
2. .diff() - For each integer value, computes the difference between it an the previous value (so if the prev val was False and this is True, 1 (1 - 0); if prev was True and this True, 0 (1 - 1); etc.)
3. .ne(0) - Converts all values that are not equal to 0 to true, and zeros to false (because (0 != 0) == False)
4. .cumsum() - Calculates cumulative sum for true (1) values. This way, all the trues before any false (0) get their own unique number, which is returned to the groupby() call, thus grouping separately each group of trues before a false
['bool'].cumsum() - From each group of consecutive true values (1), get the cumulative sum those 1s.

CodePudding user response：

There might be a more Pythonic way to do it, but here is a simple iterative approach:

current_count = 0
for index, row in data.iterrows():
    if (row['bool']):
        current_count  = 1
    else:
        current_count = 0
    data.at[index, 'count'] = current_count

Here, data is the dataframe that initially has one column named bool, then we add the count column!