Home > Software design >  Data structure for querying if a given interval is enclosed by a set of other intervals
Data structure for querying if a given interval is enclosed by a set of other intervals

Time:09-30

There are millions of non-overlapping contiguous intervals, like [a, c), [c, e), [e, g)... They are sent to me in a random order and I would like to query at any time if some other given interval can be enclosed by a combination of those contiguous intervals received.

For instance, I want my_interval_set to have a method add(c) to add one of the contiguous intervals c, a method enclose(i) to test whether an arbitrary interval i can be enclosed by a combination of intervals added before.

Something like

# add [a, c), [c, e)
my_interval_set.add([c,e])
my_interval_set.add([a,c])

my_interval_set.enclose([a,e])  # should return true

Wondering what would be a suitable data structure for this case?

If it makes a difference, we can assume the boundaries of an i will always match the boundaries of some cs, so in the above example there will not a be a query my_interval_set.enclose([b,d]) as b falls within the interval [a,c).

If there has to be trade-off between the efficiency of two operations, enclose() is more important to me.


My attempt is to use insertion sort and merge adjacent intervals (like [a,c) [c,e] -> [a,e)) as one being added, for processing an enclose query we just scan over the whole list once.

def add(c):
    idx = my_list.insert(c)  # insertion sort the interval c
    my_list.merge_if_adjacent(idx)  # check neighbour elements of c and merge if needed


def enclose(i):
    for interval in my_list:
        if interval.lo <= i.lo and interval.hi >= i.hi:
            return True
    return False

In the worst case both add() and enclose() are O(n) time complexity, which seems inefficient to me as there could millions of contiguous intervals to be added and queried from.

CodePudding user response:

Have a balanced binary tree of some sort of endpoints along with whether they are open or closed.

To insert [a, b] your logic will look like this:

let x be the last event before a (if any)
let y be the next event after b (if any)
delete all events between x and y (if any)
if x doesn't exist or is a close:
    insert (a, open) into the tree
if y doesn't exist or is an open:
    insert (b, close) into the tree

Now it may be that inserting an interval will be very expensive (because you are dealing with a long range). But the amortized cost of every interval is O(log(n)) because that's the cost to do the lookup, the cost to insert if you need to, and the cost to delete later.

The cost of testing enclosure is O(log(n)). Find the last event before or equal to the start of the interval. If it is an open, and the next event is a close at or after the end, then it is enclosed. Else not.

  • Related