Cumcount with reset in Python-CodePudding

I have a following problem. I need to compute a cumcount but I would like to reset the counter always when the series is interupted. See example:

data = { 'col_1': ['a', 'a', 'b', 'b', 'a'], 'col_2': [3, 2, 1, 0, -3]}
df = pd.DataFrame.from_dict(data)

I tried this but it gives me a wrong output:

df["seq"] = df.groupby(["col_1"]).cumcount()

What I want is:

data = { 'col_1': ['a', 'a', 'b', 'b', 'a'], 'col_2': [3, 2, 1, 0, -3], 'seq': [0, 1, 0, 1, 0]}

How can I do it, please?

CodePudding user response：

Try:

df["seq"] = df.groupby((df["col_1"] != df["col_1"].shift()).cumsum())["col_1"].cumcount()
print(df)

Output

  col_1  col_2  seq
0     a      3    0
1     a      2    1
2     b      1    0
3     b      0    1
4     a     -3    0

CodePudding user response：

Note that as you are interested in runs (like in run-length encoding) itertools.groupby might be better suited for this task, consider following example

import pandas as pd
df = pd.DataFrame({'col1':['a','a','b','b','a']})
df['seq'] = [i for k, g in itertools.groupby(col1) for i in range(len(list(g)))]
print(df)

output

  col1  seq
0    a    0
1    a    1
2    b    0
3    b    1
4    a    0