For the problem of merging intervals in a stream of data, one approach
is to store each incoming interval in a min heap (sorted by interval's start). Each add(interval)
will add interval to heap and also merge it with overlapping intervals if needed.
It is said that the complexity per add
can be worse than logn however the amortized time will be logn.
I couldn't really develop the intuition as to why this is true. I know that if merges are necessary add(interval)
can take longer than logn, because we need to pop elements from heap, but how can I prove to myself (intuitively or mathematically) that the average will be logn?
CodePudding user response:
This is a good question, because lots of amortized algorithms have similar simple proofs of their costs:
The add
operation takes O((m 1) * log N) time, where m is the number of merges that need to be performed.
Each merge, however, will remove a previously added element. Since an element that is added can only be removed once, we can we transfer the O(log N) cost of merging and removing that element to the add
operation that created it, and we arrive at the amortized cost of O(log N) per add
.