if we have a dataframe where a column is df.Series = ["a","a","b","c","d","d"] and i want count the consecutive but like the first value should be the cumsum and second occurrence should be 0 like
col["A"] col["B"]
a 2
a 0
b 1
c 1
d 2
CodePudding user response:
IIUC, you want to add the count of consecutive values as the first row per group, 0 otherwise.
You can use groupby
transform('size')
to get the size per group and mask
duplicated
to keep only the fist value:
# input or use df['colA'] if dataframe
s = pd.Series(["a","a","b","c","d","d"])
# set groups of consecutive values
group = s.ne(s.shift()).cumsum()
# compute size and keep only first
g = s.groupby(group)
df['colB'] = g.transform('size').mask(g.apply(pd.Series.duplicated), 0)
output:
0 2
1 0
2 1
3 1
4 2
5 0
dtype: int64
CodePudding user response:
Create consecutive groups by compare by shifted values:
s = pd.Series(["a","a","b","c","d","d"])
g = s.ne(s.shift()).cumsum()
print (g)
0 1
1 1
2 2
3 3
4 4
5 4
dtype: int32
Andf then mapping value by Series.value_counts
with set 0
for duplicates:
print (g.map(g.value_counts()))
0 2
1 2
2 1
3 1
4 2
5 2
dtype: int64
out = g.map(g.value_counts()).mask(g.duplicated(), 0)
print (out)
0 2
1 0
2 1
3 1
4 2
5 0
dtype: int64
Finally for DataFrame use:
df = pd.DataFrame({'A':s, 'B': out})
print (df)
A B
0 a 2
1 a 0
2 b 1
3 c 1
4 d 2
5 d 0