Home > database >  groupby cumsum (or cumcount) with cyclical data
groupby cumsum (or cumcount) with cyclical data

Time:02-06

I have a dataframe looks like,

ID SWITCH
A ON
A ON
A ON
A OFF
A OFF
A OFF
A ON
A ON
A ON
... ...
B ON
B ON
B ON
B OFF
B OFF
B OFF
B ON
B ON
B ON

Column['SWITCH'] is cyclical data and I'd like to count the number of ON and OFF for each cycle like this,

ID SWITCH Cum. Count
A ON 1
A ON 2
A ON 3
A OFF 1
A OFF 2
A OFF 3
A ON 1
A ON 2
A ON 3
... ...
B ON 1
B ON 2
B OFF 1
B OFF 2
B OFF 3
B ON 1
B ON 2
B ON 3

I'd tried 'cumsum' or 'cumcount' but it didn't reset the count when the next 'ON' cycle has come. (it keeps counting on the number from the previous cycle)

What can I do?

Thanks in advanced!

CodePudding user response:

Try put in the cumsum of the difference as well:

switch_blocsk = df['SWITCH'].ne(df['SWITCH'].shift()).cumsum()

df['cum.count'] = df.groupby(['ID', switch_blocks]).cumcount().add(1)

CodePudding user response:

You need to create a new column which indicates the change in the 'SWITCH' column, then you can use 'groupby' to perform the cumulative count.

import pandas as pd

# Create sample data
df = pd.DataFrame({'ID': ['A'] * 9   ['B'] * 9,
                   'SWITCH': ['ON'] * 3   ['OFF'] * 3   ['ON'] * 3   ['ON'] * 3   ['OFF'] * 3   ['OFF'] * 3})

df['SWITCH_CHANGE'] = (df['SWITCH'] != df['SWITCH'].shift()).astype(int)

df['Cum. Count'] = df.groupby(['ID', df.SWITCH_CHANGE.cumsum()])['SWITCH'].cumcount()   1

print(df)

Result:

ID SWITCH SWITCH_CHANGE Cum. Count
0 A ON 1 1
1 A ON 0 2
2 A ON 0 3
3 A OFF 1 1
4 A OFF 0 2
5 A OFF 0 3
6 A ON 1 1
7 A ON 0 2
8 A ON 0 3
9 B ON 0 1
10 B ON 0 2
11 B ON 0 3
12 B OFF 1 1
13 B OFF 0 2
14 B OFF 0 3
15 B OFF 0 4
16 B OFF 0 5
17 B OFF 0 6
  • Related