Home > Mobile >  Why is it not recommended to use a heap to sort a LinkedList?
Why is it not recommended to use a heap to sort a LinkedList?

Time:11-21

I know how to sort a linked list using merge sort. The question is, why don't we just use a heap to create a sorted LinkedList?

  1. Traverse the linked list and keep adding items to a min-heap.
  2. Keep taking the items out of the heap and heapify the heap and add to a new result LinkedList.

step one will have O(n) for traversing the list and O(nlogn) for adding items to the heap. Total O(nlogn) [Correct me if I am wrong].

Getting an item out of heap is O(1) adding an item as the next node in a LinkedList is O(1). [Correct me if this is wrong].

So the sort can be done in O(nlogn) if my understanding is correct. This is the same as the merge sort. In terms of memory, we are using an extra heap so total memory can be O(nlogn) I assume. Merge sort can also take O(nlogn) but can be improved to O(logn).

The heap logic is the same as the "merging k sorted linked list". I am assuming each linked list has 1 item.

I might be completely wrong about my complexities on the heap version. If someone knows the exact reason why heap should not be used, please explain. This is not heap sort and this is not an in-place algorithm. If the time complexity is O(n²logn), I am not sure how.

CodePudding user response:

As far as I know, there is no law that forbids using a heap for sorting a linked list, but consider this:

  • With the same approach you can use any sorting algorithm that is used on arrays: copy the values of the list in an array; sort the array with your favorite algorithm, and recreate a linked list from the array.

  • Most code challenges that concern linked lists will expect you to not use any other O(n) data structures, but stick to linked lists only. Depending on the challenge, there might even be a requirement to use only O(1) auxiliary memory.

  • If O(1) auxiliary memory is a requirement, then turning the linked list into a heap-organised one is not practical: it cannot offer an efficient traversal from a node to its heap-child nor to its heap-parent. On the other hand, other efficient algorithms like merge sort and some flavour of quick sort can be implemented using a linked list structure.

CodePudding user response:

I know how to sort a linked list using merge sort. The question is, why don't we just use a heap to create a sorted LinkedList?

  • Traverse the linked list and keep adding items to a min-heap.
  • Keep taking the items out of the heap and heapify the heap and add to a new result LinkedList.

Several reasons, among them:

  • Asymptotic complexity is not the end of the story. Merge sort is especially clean and efficient to implement for linked lists, so it is among the best performing of the O(n log n) linked-list sorts.

  • But asymptotic complexity is still part of the story. Heap sort of an array can be done with O(1) auxiliary space by using relationships among array indices to provide an implicit representation of the tree. This supports sorting in O(n log n) steps overall because accessing an array by index is O(1), but the same is not true of a linked list. To get an O(n log n) heap sort of a linked list, you have to build the heap as an actual tree or as an array, which requires O(n) overhead. And then it's a bit strained to call it a linked-list sort, because you could have done exactly the same thing with an array, or many other data structures.

  • Also, merge sort can easily be made stable, but that's harder for heap sort and it probably requires additional overhead.

So the sort can be done in O(nlogn) if my understanding is correct.

Sure, any data set that can be enumerated in O(n) steps can be loaded into an array and sorted via any applicable O(n log n) array-sorting algorithm in O(n log n) steps. In the particular case of a linked list, one can also reform the nodes into a sorted list without increasing the asymptotic complexity.

This is the same as the merge sort.

The same asymptotic complexity does not mean the same performance. A large difference in the constant factor can still make an important difference. Plus, even if all else were equal, the much simpler code of linked-list merge sort vs what you describe is an enormous engineering advantage. It means faster development and fewer bugs.

In terms of memory, we are using an extra heap so total memory can be O(nlogn) I assume. Merge sort can also take O(nlogn) but can be improved to O(logn).

I have no idea where your O(n log n) idea comes from for either case. I count O(n) overhead for an auxiliary array in the heap sort case. On arrays, typical merge sort implementations have O(n) overhead, but on linked lists, O(log n) overhead is natural for merge sort.

Regarding this comment:

In a book, I saw we won't be able to sort a singly linked list with quick sort or heap sort.

That book was wrong, even if we disallow reading the linked list out into an auxiliary array or other data structure on which to perform the actual sorting. If you do disallow such an auxiliary structure then I don't think you can do heap sort on a singly-linked list in less than o(n2 log n) steps, but you can do it. And you don't need fast random access or bi-directional traversal to perform an O(n log n) quick sort.

CodePudding user response:

Pushing items to the heap and popping from it are both logarithmic operations. Removing an element from the heap is logarithmic because the heap needs to adjust its elements to guarantee the heap invariant. So, you would do 2 * n * log n, and while this is still linearithmic in Big O terms like merge sort, it is probably slower. Or let's say at least that you can do better by using a linearithmic sorting algorithm, instead of using a heap for sorting.

What you could do, and I didn't see you mentioning it, is to add all the elements in the linked list to the heap's data store in O(n) time and then heapify it, which runs also in O(n) if implemented correctly. Then you would have 2n n * log n which reduces to O(n * log n) with better constants.

The additional space used by the heap isn't O(n log n). It is linear and while this depends on the heap implementation, let's say that many standard heap implementations provide that.

When merging k sorted linked lists with a heap you are benefiting from the fact that k is much smaller than the length of the lists themselves. This leads to a O((n1 n2 ...) * log k) algorithm where n1, n2,... are the lengths of the involved lists. If you didn't use a heap, that algorithm would have run in O((n1 n2 ...) * k) time.

At the end of the day, if a heap is correctly implemented and used, you will have linearithmic time complexity and linear space complexity for sorting the linked list. But whatever you do, all other variables equal, a standard sorting algorithm has better constants than a heap when it comes to sorting, unless there are some special requirements and constraints. Let me reiterate, both methods are linearithmic in terms of Big O, but sorting algorithms can have better constants for this specific purpose which can mean a lot in real world applications. The reason is that a heap needs to guarantee access in O(1) time to the smallest/largest element at any time. This puts constraints on its behavior and you wouldn't be able to optimize it as much as a standard sorting algorithm when it comes to sorting.

  • Related