I would like to find the "distance" between the starting points of two batches of 1
's in a row or in other words the length of batches of "1
's followed by 0
's" (indicated with spaces below).
So I start with the following series:
df = pd.Series([0,0, 1,1,1,0,0, 1,1,0, 1,1,1,0,0,0,0, 1,1,1,0,0,0, 1,1,0,0])
and would like to get the following output:
0 NaN
1 5.0
2 3.0
3 7.0
4 6.0
5 NaN
I know how to get either the counts of the number of 1
's in a row or the counts of the number of 0
's in a row but I don't know how to deal with the case of this pattern of 1
's followed by 0
's as a pattern for its own...
Having NaN's at the beginning and end would be the ideal case but is not necessary.
CodePudding user response:
Use diff()
to find the difference, 1
indicates starting of a new batch. Then you can use np.diff
on the index:
s = df.diff().eq(1)
np.diff(s.index[s])
# or a one-liner
# np.diff(np.where(df.diff().eq(1))[0])
Output:
array([5, 3, 7, 6])
Note There is an edge case where the series starts with a 1
.