There is an array, that look like that (it's actually a column in a pandas dataframe, but any suggestions how to make in a plain python would also work)
[0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1, 0]
For each subsequence of 1
s I need to find a midpoint position: an index of a point in the middle of this subsequence, or the closest to it. So for the example above, these would be 6 for the first subsequence, 18 for the second etc.
It can be easily done with just a naive looping, but I wonder if there is more efficient way (maybe built-in pandas function?)
CodePudding user response:
For a pure Python solution, you can use itertools.groupby which groups keys based on sequential uniqueness by default.
import itertools
data = [0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1, 0]
start = 0
midpoints = []
for key, group in itertools.groupby(data):
group = list(group)
if key == 1:
midpoints.append(start (len(group) // 2))
start = len(group)
print(midpoints)
[6, 18, 25]
For a pandas solution, we filter our data first, then use some groupby tricks to perform a groupby operation akin to itertools.groupby
, and finally get the size and start position of each group. From there, we simply add the start position of the group to half of the size, and we get the approximate midpoint.
import pandas as pd
s = pd.Series([0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1, 0])
midpoints = (
s.loc[lambda s: s.eq(1)]
.groupby(s.ne(s.shift()).cumsum())
.agg(['idxmin', 'size'])
.eval('size // 2 idxmin')
)
print(midpoints)
2 6
4 18
6 25
dtype: int64
CodePudding user response:
Try with groupby
:
- Use the series (i.e. column) index to
groupby
sequences of 0s and 1s withsrs.ne(srs.shift()).cumsum()
- Get the average of the first and last indices for each sequence
- Keep only the
unique
values where the original column value is 1
srs = pd.Series([0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1,0])
g = srs.index.to_series().groupby(srs.ne(srs.shift()).cumsum())
>>> g.transform("first").add(g.transform("last")).floordiv(2).where(srs.eq(1)).dropna().unique()
array([ 6., 18., 25.])