Get ranges of True values (start and end) in a boolean list (without using a for loop)-CodePudding

For example I want to convert this list

x=[False, True, True, True, True, False, True, True, False, True]

to a ranges (start and end locations) of True values

[[1,4],
 [6,7],
 [9,9]]

This is obviously possible using a for loop. However, I am looking for a other options that are faster and better (one-liners are welcome e.g. maybe a list comprehension). Ideally, I am looking for some way that could also be applicable to a pandas series.

CodePudding user response：

A solution with Pandas only:

s = pd.Series(x)
grp = s.eq(False).cumsum()
arr = grp.loc[s.eq(True)] \
         .groupby(grp) \
         .apply(lambda x: [x.index.min(), x.index.max()])

Output:

>>> arr
1    [1, 4]
2    [6, 7]
3    [9, 9]
dtype: object

>>> arr.tolist()
[[1, 4], [6, 7], [9, 9]]

Alternative:

start = s[s.eq(True) & s.shift(1).eq(False)].index
end = s[s.eq(True) & s.shift(-1, fill_value=False).eq(False)].index

print(list(zip(start, end)))

# Output:
[(1, 4), (6, 7), (9, 9)]

Performance*

# Solution 1
>>> %timeit s.eq(False).cumsum().loc[s.eq(True)].groupby(s.eq(False).cumsum()).apply(lambda x: [x.index.min(), x.index.max()])
1.22 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# Solution 2
>>> %timeit list(zip(s[s.eq(True) & s.shift(1).eq(False)].index, s[s.eq(True) & s.shift(-1, fill_value=False).eq(False)].index))
603 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

CodePudding user response：

Option with numpy. We can check if previous value is False and current value is True, then it's the start of True sequence. On the other hand, if previous value is True and current value is False, then it's the end of True sequence.

z = np.concatenate(([False], x, [False]))

start = np.flatnonzero(~z[:-1] & z[1:])   
end = np.flatnonzero(z[:-1] & ~z[1:])

np.column_stack((start, end-1))
array([[1, 4],
       [6, 7],
       [9, 9]], dtype=int32)

CodePudding user response：

Here's a solution that uses scipy and pandas:

import pandas as pd
import scipy as sc
def boolean_vector2ranges(x):
    df1=pd.DataFrame({'location':range(len(l)),
                      'bool':x,
                     })
    df1['group']=sc.ndimage.measurements.label(df1['bool'].astype(int))[0]
    return df1.loc[(df1['group']!=0),:].groupby('group')['location'].agg([min,max])

boolean_vector2ranges(x=[False, True, True, True, True, False, True, True, False, True])

returns,