Generating and counting consecutive number as cumsum and second as 0-CodePudding

if we have a dataframe where a column is df.Series = ["a","a","b","c","d","d"] and i want count the consecutive but like the first value should be the cumsum and second occurrence should be 0 like

col["A"] col["B"]            
       a        2
       a        0
       b        1
       c        1
       d        2

CodePudding user response：

IIUC, you want to add the count of consecutive values as the first row per group, 0 otherwise.

You can use groupby transform('size') to get the size per group and mask duplicated to keep only the fist value:

# input or use df['colA'] if dataframe
s = pd.Series(["a","a","b","c","d","d"])

# set groups of consecutive values
group = s.ne(s.shift()).cumsum()

# compute size and keep only first
g = s.groupby(group)
df['colB'] = g.transform('size').mask(g.apply(pd.Series.duplicated), 0)

output:

0    2
1    0
2    1
3    1
4    2
5    0
dtype: int64

CodePudding user response：

Create consecutive groups by compare by shifted values:

s = pd.Series(["a","a","b","c","d","d"])

g = s.ne(s.shift()).cumsum()
print (g)
0    1
1    1
2    2
3    3
4    4
5    4
dtype: int32

Andf then mapping value by Series.value_counts with set 0 for duplicates:

print (g.map(g.value_counts()))
0    2
1    2
2    1
3    1
4    2
5    2
dtype: int64

out = g.map(g.value_counts()).mask(g.duplicated(), 0)
print (out)
0    2
1    0
2    1
3    1
4    2
5    0
dtype: int64

Finally for DataFrame use:

df = pd.DataFrame({'A':s, 'B': out})
print (df)
   A  B
0  a  2
1  a  0
2  b  1
3  c  1
4  d  2
5  d  0