Home > Enterprise >  Quicksort performance in Python - random pivot vs static
Quicksort performance in Python - random pivot vs static

Time:01-26

Ok, this is driving me crazy. I created a straightforward Quicksort implementation (from CLR) with random choice of pivot:

def qsr(l, s, e):
    def qsrpartition(l, s, e):
        pivotindex=random.randrange(s,e 1)
        l[e], l[pivotindex] = l[pivotindex], l[e]
        p = l[e]
        i = s - 1
        for j in range(s, e):
            if l[j] <= p:
                i = i 1
                l[i], l[j] = l[j], l[i]
        l[i 1], l[e] = l[e], l[i 1]
        return i 1
        
    if s < e:
        q = qsrpartition(l, s, e)
        qsr(l, s, q-1)
        qsr(l, q 1, e)    

I also have a version with static choice of pivot (comment out the first two lines of qsrpartition). If I run the two versions on a random array, the randomized version above is 100 faster than the one always picking the last element as pivot (1s vs 10s).

li = random.choices(range(5000), k=5000)
li2 = random.choices(range(5000), k=5000)
r1 = timeit.timeit(lambda:qs(li, 0, len(li)-1),number=10)
r2 = timeit.timeit(lambda:qsr(li2, 0, len(li2)-1),number=10)
print(r1, r2)

11.10 0.08

The result above holds probabilistically across runs, array lengths, use of sample with or without replacement, order in which I run the functions etc. For a random choice of array, the choice of pivot should not matter, as the last element should be as good as any intermediate. I feel that there are an obvious explanation, which I am missing here.

CodePudding user response:

Your benchmarking code is not measuring what you intended.

As already commented, the lists li and li2 are sorted in the first of the 10 runs, and the 9 others are sorting already sorted lists. When a partition is already sorted, then choosing the last entry as pivot represents a worst case, giving the random pivot variant the advantage in 9 out of the 10 runs.

I would also make more runs to get a better estimate.

Here is a correction:

k = 5000
runs = 50
print(" qs", timeit.timeit(lambda: qs(random.choices(range(k), k=k), 0, k-1),number=runs))
print("qsr", timeit.timeit(lambda:qsr(random.choices(range(k), k=k), 0, k-1),number=runs))

This gives an output that slightly favors the non-random pivot version for the plain reason it doesn't have to execute those two extra statements. On repl.it I got output like this:

 qs 2.3873363960025017
qsr 2.9921084540001175

CodePudding user response:

You measure drawing random numbers and swap vs. no swap, in addition to static vs. random pick.
To just measure random vs. non-random pivot, don't just comment out the first two lines:
Unconditionally draw that random number, use a static swap (l[0], l[-1] = l[-1], l[0]) vs. the random one.

But Gene's pointed out the main fault: quicksorting ordered arrays.
Currently, you are measuring worst case vs. "average"/"random".

  • Related