From a list of random numbers, how to identify five consecutive numbers and separate them in a new l-CodePudding

I am working on a function that is generating a list of indexes based on a provided string. For instance [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]. If the list contains at least five consecutive indexes, a new list should be created containing such indexes. For example, from the above list, the list [3, 4, 5, 6, 7] should be created.

flags = [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]
IDs = []
count1 = 0
for i in range (len(flags)-1):
    if flags[i 1]-flags[i] == 1:
        IDs.append(i)
        count1  = 1
    else:
        count1 -= 1

count1 is 5, so IDs should also have five items.

CodePudding user response：

You can use itertools.groupby with a key function that subtracts each number with an incremental counter, which would return a fixed value for each group of consecutive numbers:

from itertools import groupby, count

next(g for _, (*g,) in groupby(flags, lambda n, c=count(): n - next(c)) if len(g) >= 5)

This returns:

[3, 4, 5, 6, 7]

CodePudding user response：

Any time you break the streak, you need to completely clear the list. I suggest iterating over zip(flags, flags[1:]) as an easier way of comparing sequential items.

>>> flags = [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]
>>> streak = []
>>> for x, y in zip(flags, flags[1:]):
...     if not streak:
...         streak.append(x)
...     if y == x   1:
...         streak.append(y)
...     else:
...         streak = []
...     if len(streak) >= 5:
...         print(streak)
...
[3, 4, 5, 6, 7]

Another approach would be to simply append each item to the streak, clearing it first if the streak is broken (determined by looking at the last item in the current streak):

>>> streak = [flags[0]]
>>> for f in flags[1:]:
...     if streak[-1] != f - 1:
...         streak = []
...     streak.append(f)
...     if len(streak) >= 5:
...         print(streak)
...
[3, 4, 5, 6, 7]

CodePudding user response：

Using numpy:

import numpy as np

flags = [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]
arr = np.r_[0, np.diff(flags) == 1, 0]
slices = np.nonzero(np.diff(arr))[0].reshape(-1, 2)
print(slices)

This produces a 2-dimensional array whose rows are pairs of indices showing where each subsequence of consecutive numbers begins and ends:

[[ 0  1]
 [ 2  6]
 [ 7 10]
 [11 12]]

We can then print each subsequence of consecutive nnumbers:

for s in slices:
    print(flags[s[0]: s[1] 1])

It gives:

[0, 1]
[3, 4, 5, 6, 7]
[14, 15, 16, 17]
[19, 20]

You can filter these subsequences, leaving only ones of a specified length.

CodePudding user response：

There's a related question, Identify groups of continuous numbers in a list. Using a version of Nadia Alramli's solution, we can get all the subsequences of consecutive numbers, then filter out ones shorter than 5:

from itertools import groupby

IDs = []
# Number minus index gives offset. Repeated offsets mean consecutive numbers.
consecutive_numbers = groupby(enumerate(flags), lambda x: x[1]-x[0])
for _offset, g in consecutive_numbers:
    numbers = [x[1] for x in g]
    if len(numbers) >= 5:
        IDs.append(numbers)
print(IDs)

Output:

[[3, 4, 5, 6, 7]]

Edit: blhsing's answer is very similar to mine but a lot cleaner. Here's how it would look integrated with mine:

from itertools import groupby, count

indexes = count()
# Number minus index gives offset. Repeated offsets mean consecutive numbers.
consecutive_numbers = groupby(flags, lambda n: n-next(indexes))
IDs = [g for _offset, (*g,) in consecutive_numbers if len(g) >= 5]
print(IDs)

I hope this is easier to read, if nothing else.

I'm assuming flags is always in sorted order.

CodePudding user response：

If the indexes are in ascending order, the difference between the index value and its position (delta) in the flags list will be the same for all indexes that are consecutive. Using the Counter class (from collections) you can determine the most common delta and select flag entries that match it to get the IDs list:

flags = [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]

from collections import Counter

delta,count = Counter(n-i for i,n in enumerate(flags)).most_common(1)[0]
IDs = [n for i,n in enumerate(flags) if n-i == delta]

print(count,IDs)
# 5 [3, 4, 5, 6, 7]

The above example gets the longest consecutive streak but you can use the Counter() object in various ways to find specific streak lengths or deltas from the original character positions.

If the indexes in flags are not always in ascending order, you can first split the flags into sublists (using itertools.groupby) and select the largest one:

from itertools import groupby
streaks = [[n for _,n in g]
           for _,g in groupby(enumerate(flags),lambda f:f[1]-f[0])]

print(streaks)
# [[0, 1], [3, 4, 5, 6, 7], [14, 15, 16, 17], [19, 20], [26]]

IDs   = max(streaks,key=len)
count = len(IDs)

print(IDs, count)
# [3, 4, 5, 6, 7] 5