Home > database >  Find Median of K last numbers from Data Stream
Find Median of K last numbers from Data Stream

Time:11-28

Problem: An odd number K is given. At the input, we gradually receive a sequence of N numbers. After reading each number (except the first K-1), print the median of the last K numbers.

I solved it using two heaps (MaxHeap and MinHeap, e.g. like here ) and added function remove_element(value) to both heaps. remove_element(value) iterates over the heap, looking for the value to remove. Removes it and then rebalances the tree. So it works in O(K log K).

And I solved the problem like this: by iterating the stream date, I add a new element to the heaps and delete the old one, which is already outside the window (K 1th element). So it works in O(N*(K log K log K)) = O(N*K).

I'm wondering 1) if I correctly estimated the time complexity of the algorithm, 2) is it possible to speed up this algorithm. 3) if not (the algorithm is optimal), then how can I prove its optimality.

As for the third question, if my estimate of the O(N*K) algorithm is correct, I think it can be proven based on the obvious idea that in order to write out the median, we need to check all K elements for each of N requests

CodePudding user response:

For a better approach, make the heaps be heaps of (value, i). And now you don't need to remove stale values promptly, you can simply keep track of how many non-stale values are on each side, and throw away stale values when they pop up from the heap.

And if more than half of a heap is stale, then do garbage collection to remove stale values.

If you do this well, you should be able to solve this problem with O(k) extra memory in time O(n log(k)).

  • Related