Iterating through integers not found in a list of ranges in Python-CodePudding

So, I have a function which outputs a list of ranges, e.g. [range(1,5),range(8,13)]. I need to iterate through those integers which are not in a range in this list. For example, using the previous list I would like to be able to iterate through 5-7 in the previously mentioned list.

The lists and ranges i will be working with will likely be rather large (i.e. with integers ranging from 0 to many millions).

Is there any way to do this without converting each range to a list of numbers and using a if not in statement?

CodePudding user response：

Using Mark's template:

r = [range(1,5), range(8,13), range(20, 25), range(22, 27),  range(30, 35)]

def missing(r):
    stop = r[0].stop
    for a in r:
        yield from range(stop, a.start)
        stop = max(stop, a.stop)
    
print(list(missing(r)))

Output:

[5, 6, 7, 13, 14, 15, 16, 17, 18, 19, 27, 28, 29]

CodePudding user response：

You can zip pairs and make new ranges from the stop and start of the ranges. This will work if there is overlap like range(20, 25), range(22, 27), but if some ranges are completely contained in others (like range(10, 20), range(12, 15)) it will not . If the later is a possibility you will need to add some logic to ignore the contained ranges.

r = [range(1,5), range(8,13), range(20, 25), range(22, 27),  range(30, 35)]

def missing(r):
    g = (range(a.stop,  b.start) for a, b in zip(r, r[1:]))
    for interval in g:
         yield from interval

list(missing(r))
# [5, 6, 7, 13, 14, 15, 16, 17, 18, 19, 27, 28, 29]

For a more complicated scenario with completely overlapped ranges you can ignore the contained ranges with something like:

r = [range(1,5), range(2, 4), range(8,13), range(20, 25), range(22, 27), range(23, 25), range(30, 35)]

def missing(r):
    current = r[0]
    for next_range in r[1:]:
        if next_range.stop < current.stop:
            continue
        yield from range(current.stop, next_range.start)
        current = next_range
            
list(missing(r))
# [5, 6, 7, 13, 14, 15, 16, 17, 18, 19, 27, 28, 29]

This ignores the ranges like range(2, 4) that are contained in range(1, 5).

CodePudding user response：

The itertools module makes short work of this problem:

>>> from itertools import pairwise, chain
>>> excluded = [range(1,5), range(8,13), range(20, 30), range(34, 38)]
>>> included = [range(p.stop, q.start) for p, q in pairwise(excluded)]
>>> list(chain.from_iterable(included))
[5, 6, 7, 13, 14, 15, 16, 17, 18, 19, 30, 31, 32, 33]

For legibility, I wrote this as two separate steps:

Transform the excluded ranges into a list of included ranges. An included range is built from the stop of the previous excluded range and the start of the next excluded range.
The chaining step combines all those ranges into a single iterator.

If needed, the steps can be merged together to make a one-liner:

for i in chain.from_iterable(range(p.stop, q.start) for p, q in pairwise(excluded)):
    print(i)

Edit

In the comments, the OP noted that the inputs may be more complex that shown in the example. Here's how to prep the data by sorting the inputs and removing overlapping ranges:

from operator import attrgetter

excluded = [range(23, 25), range(1,6), range(8,16), range(2, 5), range(22, 27),
            range(3, 4), range(20, 25), range(7, 15), range(30, 35)]

pairs = sorted(map(attrgetter('start', 'stop'), excluded))
non_overlapping = []
start = None
for pair in pairs:
    if start is None:
        start, stop = pair
        continue
    next_start, next_stop = pair
    if next_start >= stop:
        non_overlapping.append((start, stop))
        start, stop = next_start, next_stop
        continue
    stop = max(stop, next_stop)
if start is not None:
    non_overlapping.append((start, stop))