Finding overlapped intervals in a set of intervals-CodePudding

We have a login system that tracks how long people are connected. I would like to write a code to find people who were online at the same time. Look at this example, please:

P1: [1,7]
P2: [2,5]
P3: [3,4]
P4: [6,8]

Think of these as intervals of Person 1 to 4. I want the output of the algorithm to be something like this:

P1, P2 : [2, 3]
P1, P2, P3 : [3, 4]
P1, P2 : [4, 5]
P1, P4 : [6,7]

I tried to solve the problem with two for loops so that we get a list of people whose intervals overlap, but the problem is dealing with intervals for more than one person. for example, in the above example, [3,4] not have to come in [4, 5] in line three because it calculated as a three-person interval.

CodePudding user response：

Encode all interval bounds in a form like 1: P1, 7:-P1. Then sort all bounds chronologically and scan by increasing time. Also keep updated a list of the persons present by "excuting" insert or delete operations.

{} 1: P1 {P1} 2: P2 {P1, P2} 3: P3 {P1, P2, P3} 4:-P3 {P1, P2} 5:-P2 {P1} ...

Every configuration occurs between one time bound and the next.

Some configurations can repeat over time, but you did not specify how to handle them. With the above method, they will be listed separately.

CodePudding user response：

One approach, assuming the input is a dictionary of person -> interval:

from collections import defaultdict
from itertools import pairwise

data = {"P1": (1, 7),
        "P2": (2, 5),
        "P3": (3, 4),
        "P4": (6, 8)}

table = defaultdict(list)
for key, (start, end) in data.items():
    for i, j in pairwise(range(start, end   1)):
        table[(i, j)].append(key)

result = {k: v for k, v in table.items() if len(v) > 1}

for key, value in result.items():
    print(value, key)

Output

['P1', 'P2'] (2, 3)
['P1', 'P2', 'P3'] (3, 4)
['P1', 'P2'] (4, 5)
['P1', 'P4'] (6, 7)

CodePudding user response：

Heavily inspired by Minimum number of platforms required for a railway station, but maintaining a set of persons present instead of a count of trains present:

Also an implementation of the algorithm hinted at in Yves Daoust's answer.

The idea is to first build a chronological list of all events, where an event is either the arrival or departure of a person.

Then we iterate on the list of events, maintaining a set of persons present, adding the persons who arrive and removing the persons who leave.

Below, the chronological list of events is implemented as a python dict mapping times to python lists of events, so that events happening at the same time, if any, are naturally grouped together.

from operator import itemgetter

data = {
    'P1': [1,7],
    'P2': [2,5],
    'P3': [3,4],
    'P4': [6,8]
}

def get_overlaps(data):
    events = {}
    for person, (start, end) in data.items():
        events.setdefault(start, []).append((person, set.add))
        events.setdefault(end, []).append((person, set.remove))
    # events = {1: [('P1', add)], 2: [('P2', add)], 3: [('P3', add)], 4: [('P3', remove)], 5: [('P2', remove)], 6: [('P4', add)], 7: [('P1', remove)], 8: [('P4', remove)]}
    present_persons = set()
    start_time = 0
    for new_time, grouped_events in sorted(events.items(), key=itemgetter(0)):
        if len(present_persons) >= 2:
            yield (frozenset(present_persons), (start_time, new_time))
        start_time = new_time
        for new_person, add_or_remove in grouped_events:
            add_or_remove(present_persons, new_person)

data = {
    'P1': [1,7],
    'P2': [2,5],
    'P3': [3,4],
    'P4': [6,8]
}
overlaps = list(get_overlaps(data))

print(overlaps)
# [(frozenset({'P2', 'P1'}),       (2, 3)),
#  (frozenset({'P2', 'P1', 'P3'}), (3, 4)),
#  (frozenset({'P2', 'P1'}),       (4, 5)),
#  (frozenset({'P4', 'P1'}),       (6, 7))]

data2 = { # three events at time 5: (P1,remove), (P2,remove), (P4,add) 
    'P1': [1,5],
    'P2': [2,5],
    'P3': [3,4],
    'P4': [5,8]
}

overlaps = list(get_overlaps(data2))

print(overlaps)
# [(frozenset({'P2', 'P1'}), (2, 3)),
#  (frozenset({'P2', 'P1', 'P3'}), (3, 4)),
#  (frozenset({'P2', 'P1'}), (4, 5))]

This algorithm has time complexity O(N), where N is the number of persons. For comparison, the algorithm in Dani Mesejo's answer has time complexity O(TN), where N is the number of persons and T the maximum time that any person is present, measured in number of instants.