Home > Back-end >  Measuring the Time Complexity of Insertion And Merge Sort Hybrid
Measuring the Time Complexity of Insertion And Merge Sort Hybrid

Time:10-04

I have a very basic implementation of merge and insertion sort that involves a threshold below which insertion sort is used on sub-arrays of problem size n, where merge and insertion sort are the most basic and widely available:

def hybrid_sort(array: list, threshold: int = 10):
    if len(array) > 1:
       
        mid = len(array) // 2
        left = array[:mid]
        right = array [mid:]

        if len(array) > threshold:
            hybrid_sort(left)
            hybrid_sort(right)
            merge(array, left, right)
        else:
            insertion_sort(array)

Unless I am completely misunderstanding then this would mean that we have a recurrence relation for this particular piece of code generalized as:

T(n) = 2T(n/2) O(n^2)

The first half showing up for merge sort, and the second being insertion sort opertations.

By the master theorem, n raised to log_b(a) would equal n in this case, because you'd have n raised to the log_2(2) which is 1, so n^1 = n.

Then, our F(n) = n^2 which is is 'larger' than n, so by case 3 of the master theorem my algorithm above would be f(n) or O(n^2), because f(n) is bounded from below by n.

This doesn't seem right to me considering we know merge sort is O(nlog(n)), and I'm having a hard time wrapping my head around this. I think it's because I've not yet analyzed such an algorithm that has a conditional 'if' check.

Can anyone illuminate this for me?

CodePudding user response:

Unless the threshold itself depends on n, the insertion sort part does not matter at all. This has the same complexity as a normal merge sort.

Keep in mind that the time complexity of an algorithm that takes an input of size n is a function of n that is generally difficult to compute exactly, and so we focus on the asymptotic behavior of that function instead. This is where the big O notation comes into play.

In your case, as long as threshold is a constant, this means that as n grows, threshold becomes insignificant and all the insertion sorts can just be grouped up as a constant factor, making the overall complexity O((n-threshold) * log(n-threshold) * f(threshold)), where f(threshold) is a constant. So it simplifies to O(n log n), the complexity of merge sort.

CodePudding user response:

Here's a different perspective that might help give some visibility into what's happening.

Let's suppose that once the array size reaches k, you switch from merge sort to insertion sort. We want to work out the time complexity of this new approach. To do so, we'll imagine the "difference" between the old algorithm and the new algorithm. Specifically, if we didn't make any changes to the algorithm, merge sort would take time Θ(n log n) to complete. However, once we get to arrays of size k, we stop running mergesort and instead use insertion sort. Therefore, we'll make some observations:

  • There are Θ(n / k) subarrays of the original array of size k.
  • We are skipping calling mergesort on all these arrays. Therefore, we're avoiding doing Θ(k log k) work for each of Θ(n / k) subarrays, so we're avoiding doing Θ(n log k) work.
  • Instead, we're insertion-sorting each of those subarrays. Insertion sort, in the worst case, takes time O(k2) when run on an array of size k. There are Θ(n / k) of those arrays, so we're adding in a factor of O(nk) total work.

Overall, this means that the work we're doing in this new variant is O(n log n) - O(n log k) O(nk). Dialing k up or down will change the total amount of work done. If k is a fixed constant (that is, k = O(1)), this simplifies to

O(n log n) - O(n log k) O(nk)

= O(n log n) - O(n) O(n)

= O(n log n)

and the asymptotic runtime is the same as that of regular insertion sort.

It's worth noting that as k gets larger, eventually the O(nk) term will dominate the O(n log k) term, so there's some crossover point where increasing k starts decreasing the runtime. You'd have to do some experimentation to fine-tune when to make the switch. But empirically, setting k to some modest value will indeed give you a big performance boost.

  • Related