Home > OS >  Cumulative count resetting to and staying 0 based on a condition in pandas
Cumulative count resetting to and staying 0 based on a condition in pandas

Time:07-19

I am trying to create a column 'count' on a pandas DF that cumulatively counts when field 'boolean' is True but resets and stays at 0 when 'boolean' is False. Also needs to be grouped by the ID column, so the count resets when looking at a new ID. No loops please as working with a big data set

Used the code from the following question which works but need to add a group by to include the ID column grouping

Pandas Dataframe - Row Iteration with Resetting Count-Value by Condition without loop

Expected output below: (ID, Boolean columns already exist, just need to create Count)

ID  Boolean  Count
1   True     1
1   True     2
1   True     3
1   True     4
1   True     5
1   False    0
1   False    0 
1   False    0
1   False    0
1   True     1
1   True     2
1   True     3
2   True     1
2   True     2
2   True     3
2   True     4
2   False    0
2   False    0
2   False    0
2   True     1
2   True     2
2   True     3

CodePudding user response:

Identify blocks by using cumsum on inverted boolean mask, then group the dataframe by ID and blocks and use cumsum on Boolean to create a counter

b = (~df['Boolean']).cumsum()
df['Count'] = df.groupby(['ID', b])['Boolean'].cumsum()

    ID  Boolean  Count
0    1     True      1
1    1     True      2
2    1     True      3
3    1     True      4
4    1     True      5
5    1    False      0
6    1    False      0
7    1    False      0
8    1    False      0
9    1     True      1
10   1     True      2
11   1     True      3
12   2     True      1
13   2     True      2
14   2     True      3
15   2     True      4
16   2    False      0
17   2    False      0
18   2    False      0
19   2     True      1
20   2     True      2
21   2     True      3

CodePudding user response:

df['Count'] = df.groupby('ID')['Boolean'].diff()
df = df.fillna(False)
df['Count'] = df.groupby('ID')['Count'].cumsum()
df['Count'] = df.groupby(['ID', 'Count'])['Boolean'].cumsum()
df

    ID  Boolean  Count
0    1     True      1
1    1     True      2
2    1     True      3
3    1     True      4
4    1     True      5
5    1    False      0
6    1    False      0
7    1    False      0
8    1    False      0
9    1     True      1
10   1     True      2
11   1     True      3
12   2     True      1
13   2     True      2
14   2     True      3
15   2     True      4
16   2    False      0
17   2    False      0
18   2    False      0
19   2     True      1
20   2     True      2
21   2     True      3

CodePudding user response:

You can use a column shift for ID and Boolean columns to identify the groups to do the groupby on. Then do a cumsum for each of those groups.

groups = ((df['ID']!=df['ID'].shift()) | (df['Boolean']!=df['Boolean'].shift())).cumsum()

df.assign(Count2=df.groupby(groups)['Boolean'].cumsum())

Result

    ID  Boolean  Count  Count2
0    1     True      1       1
1    1     True      2       2
2    1     True      3       3
3    1     True      4       4
4    1     True      5       5
5    1    False      0       0
6    1    False      0       0
7    1    False      0       0
8    1    False      0       0
9    1     True      1       1
10   1     True      2       2
11   1     True      3       3
12   2     True      1       1
13   2     True      2       2
14   2     True      3       3
15   2     True      4       4
16   2    False      0       0
17   2    False      0       0
18   2    False      0       0
19   2     True      1       1
20   2     True      2       2
21   2     True      3       3
  • Related