Home > Enterprise >  Numpy: given a set of ranges, is there an efficient way to find the set of ranges that are disjoint
Numpy: given a set of ranges, is there an efficient way to find the set of ranges that are disjoint

Time:01-24

Is there an elegant way to find the set of disjoint ranges from a set of ranges in numpy?

disjoint_ranges = [] # these are all disjoint
adjoint_ranges = [] # these do not all have to be mutually adjoint
for index, range_1 in enumerate(ranges):
    i, j = range_1 # all ranges are ordered s.t. i<j
    for swap_2 in ranges[index 1:]: # the list of ranges is ordered by increasing i
        a, b, _ = swap_2
        if a<j and a>i:
            adjoint_swaps.append(swap)
            adjoint_swaps.append(swap_2)
    else:
        if swap not in adjoint_swaps:
            swaps_to_do.append(swap)
print(adjoint_swaps)
print(swaps_to_do)

CodePudding user response:

I'm not sure with numpy but there is the following with pandas:

from functools import reduce
import pandas as pd

ranges = [
    pd.RangeIndex(10, 20),
    pd.RangeIndex(15, 25),
    pd.RangeIndex(30, 50),
    pd.RangeIndex(40, 60),
]

disjoints = reduce(lambda x, y : x.symmetric_difference(y), ranges)
disjoints
Int64Index([10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 35, 36,
            37, 38, 39, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
           dtype='int64')

CodePudding user response:

Looping on numpy array kinda defeats the purpose of using numpy. You can detect disjoint ranges by leveraging the accumulate method.

With your ranges sorted in order of their lower bound, you can accumulate the maximum of the upper bounds to determine the coverage of previous ranges over subsequent ones. Then compare the lower bound of each range to the reach of the previous ones to know if there is a forward overlap. Then you only need to compare the upper bound of each range with the next one's lower bound to detect backward overlaps. The combination of forward and backward overlaps will allow you to flag all overlapping ranges and, by elimination, find the ones that are completely disjoint from others:

import numpy as np

ranges = np.array( [ [1,8], [10,15], [2,5], [18,24], [7,10] ] )
ranges.sort(axis=0)

overlaps       = np.zeros(ranges.shape[0],dtype=np.bool)
overlaps[1:]   = ranges[1:,0] < np.maximum.accumulate(ranges[:-1,1])
overlaps[:-1] |= ranges[1:,0] < ranges[:-1,1]

disjoints = ranges[overlaps==False]

print(disjoints)
   
[[10 15]
 [18 24]]
  • Related