i want to find the cumulative count before there is a change in value, i.e. how many rows since the last change. For illustration:
Value | diff | #row since last change (how do I create this column?) |
---|---|---|
6 | na | na |
5 | -1 | 0 |
5 | 0 | 1 |
5 | 0 | 2 |
4 | -1 | 0 |
4 | 0 | 1 |
4 | 0 | 2 |
4 | 0 | 3 |
4 | 0 | 4 |
5 | 1 | 0 |
5 | 0 | 1 |
5 | 0 | 2 |
5 | 0 | 3 |
6 | 1 | 0 |
7 | 1 | 0 |
i tried to use cumsum but it does not reset after each change
CodePudding user response:
IIUC, use a cumcount
per group:
df['new'] = df.groupby(df['Value'].ne(df['Value'].shift()).cumsum()).cumcount()
output:
Value diff new
0 6 na 0
1 5 -1 0
2 5 0 1
3 5 0 2
4 4 -1 0
5 4 0 1
6 4 0 2
7 4 0 3
8 4 0 4
9 5 1 0
10 5 0 1
11 5 0 2
12 5 0 3
13 6 1 0
14 7 1 0
If you want the NaN based on diff
: you can mask
the output:
df['new'] = (df.groupby(df['Value'].ne(df['Value'].shift()).cumsum()).cumcount()
.mask(df['diff'].isna())
)
output:
Value diff new
0 6 NaN NaN
1 5 -1.0 0.0
2 5 0.0 1.0
3 5 0.0 2.0
4 4 -1.0 0.0
5 4 0.0 1.0
6 4 0.0 2.0
7 4 0.0 3.0
8 4 0.0 4.0
9 5 1.0 0.0
10 5 0.0 1.0
11 5 0.0 2.0
12 5 0.0 3.0
13 6 1.0 0.0
14 7 1.0 0.0
CodePudding user response:
If performance is important count consecutive 0
values from difference column:
m = df['diff'].eq(0)
b = m.cumsum()
df['out'] = b.sub(b.mask(m).ffill().fillna(0)).astype(int)
print (df)
Value diff need out
0 6 NaN na 0
1 5 -1.0 0 0
2 5 0.0 1 1
3 5 0.0 2 2
4 4 -1.0 0 0
5 4 0.0 1 1
6 4 0.0 2 2
7 4 0.0 3 3
8 4 0.0 4 4
9 5 1.0 0 0
10 5 0.0 1 1
11 5 0.0 2 2
12 5 0.0 3 3
13 6 1.0 0 0
14 7 1.0 0 0