Matching integers to ranges-CodePudding

I have n integers and m ranges (think [a, b]) of numbers such that n <= m. An integer and a range may be matched if the integer is in the range, i.e., a <= integer <= b. A range may only be matched with one integer and vice versa.

Here is an example:

Ranges: [0, 5], [3, 6], [9, 21]

Integers: 1, 5

The maximum valid matching is to match 1 to the first range and 5 to the second range.

I want to find a greedy algorithm that maximizes the number of integers matched. My initial thought is to greedily choose the shortest range and match the number which is closest to the shortest range's start point (so the a in a range [a, b]). Howveer, I'm not sure this is right.

CodePudding user response：

Here's an idea: When an interval ends, you can't put an integer greater than the end in that interval. This means that you need to satisfy the ranges that end sooner first, if you intend to fill as many as possible. Now, what do you do when two intervals have equal ends? You fill the one that starts sooner. This way you increase your chances of filling the next range/interval with the same ending but a late start.

Now, suppose we sort the integers none-decreasing and the intervals, first by end then start, none-decreasing and start to fill the intervals greedily. What could happen is that an interval lands at the end of the intervals array, since it has a large end, while it has a very small start. When we go over our integers, if we skipped an integer because the first interval's start is greater than that integers, we run the risk to ignore it while we could have fitted the integer in that later coming interval with the large end and very small start.

To mitigate that, we need to check all the remaining intervals for every integer that we consider, and take the first one where we can fit the integer, if any. This deteriorates our run time complexity into an O(n * m), but I don't see how we could overcome that to make it linearithmic.

Here's the above idea converted into an algorithm:

Sort your ranges/intervals by end then start none-decreasing.
Sort your integers none-decreasing.
Have a set where you keep track of the used intervals, used_intervals.
Have an array for the resulting integers, res.
Run the following nested loop:

for integer in integers:
  for interval in intervals:
    if interval in used_intervals: continue
    if interval.start <= integer <= interval.end:
      res.push(integer)
      used_intervals.add(interval)

Time complexity analysis:

Sorting the intervals is O(m * log m) where m is the number of intervals
Sorting the integers is O(n * log n) where n is the number of integers
The nested loop is O(m * n) Since n <= m, this reduces the time complexity in terms of Big O to O(max(m * n, m * log m)).

Space complexity analysis: O(m), sorting, the result array, the set

I'm not sure if we can optimize this, not the time complexity but still an optimization. Any feedback is appreciated.

CodePudding user response：

What if you matched every possible match, allowing multiple connections w.r.t. integers, and eliminated prior matches whenever they participate in multiple connections w.r.t. ranges? You'd have to follow up by deleting any remaining multiples to obtain a valid result.

In your example, you'd iterate through integers:

1 : match [0,5]
5 : match [0,5], [3,6] then eliminate [0,5]

A little more complex example might be 1, 5, 7 with ranges [0,5], [2,6], [3,10], [5,21]

1 : match [0,5]
5 : match [0,5], [2,6], [3,10], [5,21] then eliminate [0,5]
7 : match [3,10], [5,21] then eliminate [3,10]

You're left with:

1 : [0,5]
5 : [2,6], [3,10], [5,21]
7 : [5,21]

You preferentially eliminate [5,21] from 5 because the range has multiple placements and you're eliminating from an integer with multiple placements. And finally, you eliminate [3,10] (although your could eliminate [2,6]) to remove all multiples among the integers.