K Closest with unsorted array-CodePudding

I am prepping for interview leet-code type problems and I came across the k closest problem, but given a sorted array. This problem requires finding the k closest elements by value to an input value from the array. The answer to this problem was fairly straight forward and I did not have any issues determining a linear-time algorithm to solve it.

However, working on this problem got me thinking. Is it possible to solve this problem given an unsorted array in linear time? My first thought was to use a heap and that would give an O(nlogk) time complexity solution, but I am trying to determine if its possible to come up with an O(n) solution? I was thinking about possibly using something like quickselect, but the issue is that this has an expected time of O(n), not a worst case time of O(n).

Is this even possible?

CodePudding user response：

Your heap solution would be very welcome at an interview, I'm sure.

If you really want to get rid of the logk, which in practical applications should seldom be a problem, then yes, using Quickselect would be another option. Something like this:

Partition your array in values smaller and larger than x. <- O(n).
For the lower half, run Quickselect to find the kth largest number, then take the right-side partition which are your k largest numbers. <- O(n)
Repeat step 2 for the higher half, but for the k smallest numbers. <- O(n)
Merge your k smallest and k largest numbers and extract the k closest numbers. <- O(k)

This gives you a total time complexity of O(n), as you said.

However, a few points about your worry about expected time vs worst-case time. I understand that if an interview question explicitly insists on worst-case O(n), then this solution might not be accepted, but otherwise, this can well be considered O(n) in practice.

The key here being that for randomized quickselect and random or well-behaved input, the probability that the time complexity goes beyond O(n) decreases exponentially as the input grows. Meaning that already at largeish inputs, the probability is as small as guessing at a specific atom in the known universe. The assumption on well-behaved input concerns being somewhat random in nature and not adversarial. See greybeard's comment below, and this discussion on a similar (not identical) problem.

CodePudding user response：

The median-of-medians algorithm makes Quickselect take O(n) time in the worst case.

It is used to select a pivot:

Divide the array into groups of 5 (O(n))
Find the median of each group (O(n))
Use Quickselect to find the median of the n/5 medians (O(n))

The resulting pivot is guaranteed to be greater and less than 30% of the elements, so it guarantees linear time Quickselect.

After selecting the pivot, of course, you have to continue on with the rest of Quickselect, which includes a recursive call like the one we made to select the pivot.

The worst case total time is T(n) = O(n) T(0.7n) T(n/5), which is still linear. Compared to the expected time of normal Quickselect, though, it's pretty slow, which is why we don't often use this in practice.