Count groups of consecutive values in pandas-CodePudding

I have a dataframe with 0 and 1 and I would like to count groups of 1s (don't mind the 0s) with a Pandas solution (not itertools, not python iteration).

Other SO posts suggest methods based on shift()/diff()/cumsum() which seems not to work when the leading sequence in the dataframe starts with 0.

df = pandas.Series([0,1,1,1,0,0,1,0,1,1,0,1,1]) # should give 4
df = pandas.Series([1,1,0,0,1,0,1,1,0,1,1])     # should also give 4
df = pandas.Series([1,1,1,1,1,0,1])             # should give 2

Any idea ?

CodePudding user response：

If you only have 0/1, you can use:

s = pd.Series([0,1,1,1,0,0,1,0,1,1,0,1,1])

count = s.diff().fillna(s).eq(1).sum()

output: 4 (4 and 2 for the other two)

Then fillna ensures that Series starting with 1 will be counted

faster alternative

use the diff, count the 1 and correct the result with the first item:

count = s.diff().eq(1).sum() (s.iloc[0]==1)

comparison of different pandas approaches:

CodePudding user response：

Let us identify the diffrent groups of 1's using cumsum, then use nunique to count the number of unique groups

m = df.eq(0)
m.cumsum()[~m].nunique()

Result

case 1: 4
case 2: 4
case 3: 2