Home > Enterprise >  pandas - find index distance between batches of equal values in a row
pandas - find index distance between batches of equal values in a row

Time:11-16

I would like to find the "distance" between the starting points of two batches of 1's in a row or in other words the length of batches of "1's followed by 0's" (indicated with spaces below).

So I start with the following series:

df = pd.Series([0,0, 1,1,1,0,0,  1,1,0,  1,1,1,0,0,0,0,  1,1,1,0,0,0,  1,1,0,0])

and would like to get the following output:

0    NaN
1    5.0
2    3.0
3    7.0
4    6.0
5    NaN

I know how to get either the counts of the number of 1's in a row or the counts of the number of 0's in a row but I don't know how to deal with the case of this pattern of 1's followed by 0's as a pattern for its own...

Having NaN's at the beginning and end would be the ideal case but is not necessary.

CodePudding user response:

Use diff() to find the difference, 1 indicates starting of a new batch. Then you can use np.diff on the index:

s = df.diff().eq(1)
np.diff(s.index[s])

# or a one-liner
# np.diff(np.where(df.diff().eq(1))[0])

Output:

array([5, 3, 7, 6])

Note There is an edge case where the series starts with a 1.

  • Related