I'm looking for an algorithm to achieve the following:
Given an arbitrary set of "intervals", where an interval is defined simply by a start and an end (2 floating point numbers with end >= start
). I would like to organise these intervals into 1 or more "bins"/"buckets"/groups such that:
- No two intervals within a single bin overlap each other
- The minimum number of bins is used
My solution has been to iterate the intervals and for each one, essentially perform a binary search on each bin until one is found which can accommodate the interval (new bin if necessary). This works but I'm left wondering if it can be optimised because depending on the order in which the intervals are processed, the result is different. I have a feeling that sorting the intervals from largest to smallest (end - start
) gives better results but I'm not certain.
CodePudding user response:
Sort the intervals by their end point, and then take each interval in end-point order, and put it into the bucket with the largest right-most end point, such that bucket.max_endpoint < interval.startpoint. If there is no such bucket, then you have to start a new one.
If you keep the buckets in a binary search tree sorted by max_endpoint, then you can find the best one in log(|buckets|) time, for O(N log N) all together.
The proof that choosing the tightest fit interval for each bucket is optimal is simple:
Imagine that you already know an optimal assignment of intervals to buckets, and you put them into the buckets in endpoint order, and at some point you don't choose the tightest-fit bucket...
If you change your mind and choose the tightest-fit bucket instead, you'll be in the same situation, except that one of the buckets will have more room on the right. It never hurts to have more room, so the remaining intervals will still fit.