Home > other >  Print k largest elements in a max heap sized n, in klog(k) complexity
Print k largest elements in a max heap sized n, in klog(k) complexity

Time:11-19

I tried writing an algorithm that prints the k largest elemtns of a max heap but I cannot do it in the right complexity.

This is the Pseudo-code I wrote-

Print_k_largest(A[1,…,n],k):

If k>Heapsize(A):Error
i=1
insert[B,A[i])
print(B[1])
k-=1

While k>0:
    if 2*i< Heapsize(A):
        Insert(B,A[2*i])
        Insert(B,A[2*i 1])

    elif 2*i= Heapsize(A):
       Insert(B,A[2*i])

    B[1]=B[Heapsize(B)]
    Heapsize(B)-=1
    Max-Heapify(B,1)
    print(B[1])
    i=Binary_search(A[1,…,n],B[1])      
    k-=1

enter image description here

In this solution I create a new max heap based on the original one, so that its size is always smaller than K hence the complexity of max-heapify and other such functions is O(klogk) and not O(klogN) as I was requested to do. This Pseudocode is based on the solution suggested here.

The idea is like this- because it's a max heap the largest element is the root, the second largest element is one of the root's son, the third one is either the other son or the sons of the current largest one and so on. In each iteration I insert the sons of the former largest (the one I printed before), remove the former largest, Max-heapify (to make the heap a max heap again, hence the root is the newest largest) and print the newest largest (newest root). The principle in this brilliant solution (unfortuantely not mine haha) is to do all the changes on a second heap whose size is always smaller than K (because in each of the k iterations we add maximum 2 new elements and remove one) so that the runtime for actions like max-heapify is O(logk) and not O(logn). The thing is that to add the sons of the current largest I need an acess to its location (index) on the original tree! I don't know how to do it without it costing logn and runing everything.

I would appreaciate any help.

CodePudding user response:

(This might be exactly the algorithm you're already trying to implement, if so hope this phrasing makes it clearer)

So I would create a second heap, but it would not just be a heap of values - it would be a heap of positions in the original heap, inserted by the value at that position.

Call the original heap A and the new heap B. Start by inserting the head of A into B. Then, repeatedly pop the head of B, and insert the children (in A) into B.

So if A is build out of nodes like:

Node(value : int, left : Option[Node], right : Option[Node])

Ordered by value

Then B will be build out of meta-Nodes like:

MetaNode(value : Node, left : Option[MetaNode], right : Option[MetaNode])

Ordered by value.value

Initialize B with MetaNode(A.head) as its only element

Then repeatedly do:

for i in range 0..k-1:
    current = B.pop.value
    B.push (current.left) //might be None, should be coded so this is a no-op
    B.push (current.right) //see above
    results.add(current.value)

That isn't the simplest algorithm I've ever described, so if anything is unclear, please ask and I'll try to describe it more clearly. :)

CodePudding user response:

I have no idea why you think you need to binary search. The point of a heap is to not need to do that.

Remember that the key operations in a heap are as follows:

  1. Append - add an element to the end.
  2. Swap - exchange two elements.
  3. SiftDown - Take an element and have it "fall down" to its place. That is, while it is not the root and is bigger than its parent, you Swap it with its parent and keep going. (Note, in the pointer version, you compare not by comparing the pointers, but by comparing the values that they point to. The principle is the same but you may need to rewrite the heap code to do it.)
  4. HeapInsert - You Append then SiftDown.
  5. SiftUp - Opposite of SiftDown. While the element has children and is smaller than at least one, you swap it with the larger of its children and keep going.
  6. HeapPop - You Swap the first and last elements, remove the last to a temporary variable, SiftUp the first element, then return what used to be the first element.

With these operations your pseudocode becomes:

Print_k_largest(A[1,…,n],k):

If k>Heapsize(A):Error
i=1
HeapInsert[B,pointer to A[i]])

While k>0:
    if 2*i< Heapsize(A):
        HeapInsert(B,pointer to A[2*i])
        HeapInsert(B,pointer to A[2*i 1])

    elif 2*i= Heapsize(A):
       HeapInsert(B,pointer to A[2*i])

    PtrToElement = HeapPop(B)    
    print(PtrToElement.value)      
    k-=1

If you work in a language with native pointers, like C, you can construct pointers very easily using & (aka "address of") and * (aka "thing pointed to by) operators. If you work in a language without pointers, you'll need to store indexes into A like i and j, and then get values back using A[i] and A[j]. Which means that A has to become an argument to every operation in your heap of "pointers" because the language doesn't support actual pointers.

Either way the odds are good that you'll actually need to implement your own "heap of pointers".

In Python I cheat. Rather than use integers in the heap, I'll use tuples. So I wind up storing (A[i], i). Thus I have both value and index stored together, and compare by value first. If your language has dynamic typing and tuples, this can be more convenient than rewriting a heap implementation. But I doubt that the language you're working with will support this trick.

  • Related