Ok, this is driving me crazy. I created a straightforward Quicksort implementation (from CLR) with random choice of pivot:
def qsr(l, s, e):
def qsrpartition(l, s, e):
pivotindex=random.randrange(s,e 1)
l[e], l[pivotindex] = l[pivotindex], l[e]
p = l[e]
i = s - 1
for j in range(s, e):
if l[j] <= p:
i = i 1
l[i], l[j] = l[j], l[i]
l[i 1], l[e] = l[e], l[i 1]
return i 1
if s < e:
q = qsrpartition(l, s, e)
qsr(l, s, q-1)
qsr(l, q 1, e)
I also have a version with static choice of pivot (comment out the first two lines of qsrpartition). If I run the two versions on a random array, the randomized version above is 100 faster than the one always picking the last element as pivot (1s vs 10s).
li = random.choices(range(5000), k=5000)
li2 = random.choices(range(5000), k=5000)
r1 = timeit.timeit(lambda:qs(li, 0, len(li)-1),number=10)
r2 = timeit.timeit(lambda:qsr(li2, 0, len(li2)-1),number=10)
print(r1, r2)
11.10 0.08
The result above holds probabilistically across runs, array lengths, use of sample with or without replacement, order in which I run the functions etc. For a random choice of array, the choice of pivot should not matter, as the last element should be as good as any intermediate. I feel that there are an obvious explanation, which I am missing here.
CodePudding user response:
Your benchmarking code is not measuring what you intended.
As already commented, the lists li
and li2
are sorted in the first of the 10 runs, and the 9 others are sorting already sorted lists. When a partition is already sorted, then choosing the last entry as pivot represents a worst case, giving the random pivot variant the advantage in 9 out of the 10 runs.
I would also make more runs to get a better estimate.
Here is a correction:
k = 5000
runs = 50
print(" qs", timeit.timeit(lambda: qs(random.choices(range(k), k=k), 0, k-1),number=runs))
print("qsr", timeit.timeit(lambda:qsr(random.choices(range(k), k=k), 0, k-1),number=runs))
This gives an output that slightly favors the non-random pivot version for the plain reason it doesn't have to execute those two extra statements. On repl.it I got output like this:
qs 2.3873363960025017
qsr 2.9921084540001175
CodePudding user response:
You measure drawing random numbers and swap vs. no swap, in addition to static vs. random pick.
To just measure random vs. non-random pivot, don't just comment out the first two lines:
Unconditionally draw that random number, use a static swap (l[0], l[-1] = l[-1], l[0]
) vs. the random one.
But Gene's pointed out the main fault: quicksorting ordered arrays.
Currently, you are measuring worst case vs. "average"/"random".