I am asked to find the run time of the general form of the median of medians algorithm for groups of size g. It seems from common examples g=3,5,7
with T(n)=T(n/5) T(2n/3) cn
, T(n)=T(n/5) T(7n/10) cn
, and T(n)=T(n/7) T(5n/7) cn
, respectively, that the general for an odd number would be T(n)=T(n/g) T(1-(n/(2g)*(g 1)/2)) cn
.
However, I am struggling with what to use as median for an even number g where the median will exist between two elements and not actually in the set itself. I am told that I can ignore floors and ceilings. Intuition tells me it should simply be T(n)=T(n/g) T(1-(n/(2g)*(g)/2)) cn
, but I can't help but feel that I'm missing something.
Can anyone give advice on how the algorithm might work with groups of even numbers? I think I should be able to find the run time once I understand the algorithm.
CodePudding user response:
Your analysis is correct; when g is even, you can express the run-time as T(n) = T(n/g) T(3n/4) cn, which is your expression simplified; the inductive proof that this is O(n) whenever g > 4 is the same regardless of whether g is even or odd.
To see why this equation is true, it's easiest to think about how the expression for T(n) with odd g is derived. We can assume that our input list A has no duplicate elements, without loss of generality (by either slightly modifying the algorithm, or by replacing every element A[i] with the tuple (A[i], i) and using lexicographic comparisons). This makes the analysis much easier.
Now, Median-of-Medians Quickselect's run-time is based on the three things it does:
- Call itself recursively on the 'medians' list of size
ceil(n/g)
to find the median-of-mediansM
cn
work to group items, partition the list aroundM
, and find the median of each small group-of-g, and- Call itself recursively on either the partition with all elements less than
M
, the partition with all elements greater thanM
, or immediately return.
Ignoring the ceil and an additive O(1)
constant, this gives T(n) = T(n/g) T(New Partition Size) cn
. What's the largest the New Partition Size can be? It's max(# elements less than M, # elements greater than M)
.
When g
was odd, we had that half of the n/g
groups had medians less than M
(so (g-1)/2
, plus 1
for the median, elements in that group are less than M
), and half had medians greater than M
(again, giving (g 1)/2
'greater than M
' elements for each such group).
Since you're defining median of an even list as the arithmetic mean of the two middle elements, this gets even simpler: half of the n/g
groups have medians less than M
, and exactly half the elements of each such group is less than its median and thus less than M
; the same logic works for greater than. This means we have eliminated at least (half of n/g times g/2
), or n/4
elements, leaving us with 3n/4
as the maximum New Partition Size and T(n) = T(n/g) T(3n/4) cn
.