Home > front end >  What would be the more efficient to find interesection in time
What would be the more efficient to find interesection in time

Time:07-05

Imagine two passages with that are represented with seconds of speaking in tuples. Example, speaking between the 566th second and 579th.

Old example

passage1 = (566,579),(573,583.33)
passage2 = (574,579.21),(614,620)

=> intersection (574,579.21)

New example

passage1 = (566,579),(570,590)
passage2 = (572,575),(577,620)

=> intersection (572,575) (577,590)

What would be the most efficient way to find the intersection between these segments? How can I represent time in python ? I am open to every idea since I haven't found any way how to represent it and make the calculations.

CodePudding user response:

First, sort the intervals within each passage to make sure they are in order of start time, and merge any overlapping intervals within each passage.

passage1 = (566,579),(573,583.33)
passage2 = (574,579.21),(614,620)       
passage3 = (566,579),(570,590)
passage4 = (572,575),(577,620)

def merge_overlapping_intervals(intervals):
    intervals = iter(sorted(intervals))
    start, end = next(intervals)
    for next_start, next_end in intervals:
        if next_start <= end <= next_end:
            end = next_end
        else:
            yield start, end
            start = next_start
            end = next_end
    yield start, end

print(list(merge_overlapping_intervals(passage1)))   # 566, 583.33

Then, find all overlaps between intervals in passage 1 with intervals in passage 2:

def interval_overlaps(intervals1, intervals2):
    """ Yield all overlaps of intervals in sequence intervals1 with intervals in sequence intervals2 """
    for start1, end1 in merge_overlapping_intervals(intervals1):
        for start2, end2 in merge_overlapping_intervals(intervals2):
            if end1 > start2 and end2 > start1:
                yield max(start1, start2), min(end1, end2)

print(list(interval_overlaps(passage1, passage2)))   # [(574, 579.21)]
print(list(interval_overlaps(passage3, passage4)))   # [(572, 575), (577, 590)]

Equivalently, you could identify all overlaps first, and then sort them and merge any overlapping overlaps.

Because of the nested loop, the time-efficiency of this is O(nm) where n and m are the number of segments in each passage. A more time-efficient algorithm for identifying overlaps should be possible, given that the passages are sorted, but would require a bit more thought to write, and I imagine may not be needed.

CodePudding user response:

Determining if two segments intersect is quite easy. If the end of either is before the start of the other, they don't intersect.

Once you know they intersect, the intersection itself is trivial to determine - it's the later of the two start times and the earlier of the two end times. Note that this could result in a zero length segment.

Now you could simply go through each of the segments in passage1 and check against all of the segments in passage2. But that would be a slow O(n²) algorithm (assuming both sequences are of the same length n), and you're interested in efficiency.

Since you say in the comments that the segments in each sequence will be non-overlapping, we can use that information to find a more efficient algorithm. We can generate a sorted timeline of start and stop events for each passage, and the periods when you have a start and stop for both represent an intersection. Unfortunately the example data you gave in the question does not meet the constraint, so I'm not sure this will work.

Edit: I see you've changed the example to meet the constraint, so now it actually works. I held off posting this because of that problem.

def intersections(seq1, seq2):
    events = sorted([(s, False, 0) for s,e in seq1]  
            [(e, True, 0) for s,e in seq1]  
            [(s, False, 1) for s,e in seq2]  
            [(e, True, 1) for s,e in seq2])
    starts = [None, None]
    for time, end_flag, seq_num in events:
        if end_flag:
            if (starts[0] is not None) and (starts[1] is not None):
                yield max(starts), time
            starts[seq_num] = None
        else:
            starts[seq_num] = time

>>> for s,e in intersections(passage1, passage2):
    print(s, e)

572 575
577 579
  • Related