How to find 2 special elements in the array in O(n)-CodePudding

Let a1,...,an be a sequence of real numbers. Let m be the minimum of the sequence, and let M be the maximum of the sequence.

I proved that there exists 2 elements in the sequence, x,y, such that |x-y|<=(M-m)/n.

Now, is there a way to find an algorithm that finds such 2 elements in time complexity of O(n)?

I thought about sorting the sequence, but since I dont know anything about M I cannot use radix/bucket or any other linear time algorithm that I'm familier with.

I'd appreciate any idea. Thanks in advance.

CodePudding user response：

First note that the real threshold should be (M-m)/(n-1).

The first step is to calculate the min m and max M elements, in O(N).

You calculate the mid = (m M)/2value.

You concentrate the value less than mid at the beginning, and more than mid at the end of he array.

You select the part with the largest number of elements and you iterate until very few numbers are kept.

If both parts have the same number of elements, you can select any of them. If the remaining part has much more elements than n/2, then in order to maintain a O(n) complexity, you can keep onlyn/2 1 of them, as the goal is not to find the smallest difference, but one difference small enough only.

As indicated in a comment by @btilly, this solution could fail in some cases, for example with an input [0, 2.1, 2.9, 5]. For that, it is needed to calculate the max value of the left hand, and the min value of the right hand, and to test if the answer is not right_min - left_max. This doesn't change the O(n) complexity, even if the solution becomes less elegant.

Complexity of the search procedure: O(n) O(n/2) O(n/4) ... O(2) = O(2n) = O(n).

CodePudding user response：

First find out n, M, m. If not already given they can be determined in O(n).
Then create a memory storage of n 1 elements; we will use the storage for n 1 buckets with width w=(M-m)/n. The buckets cover the range of values equally: Bucket 1 goes from [m; m w[, Bucket 2 from [m w; m 2*w[, Bucket n from [m (n-1)*w; m n*w[ = [M-w; M[, and the (n 1)th bucket from [M; M w[.
Now we go once through all the values and sort them into the buckets according to the assigned intervals. There should be at a maximum 1 element per bucket. If the bucket is already filled, it means that the elements are closer together than the boundaries of the half-open interval, e.g. we found elements x, y with |x-y| < w = (M-m)/n.
If no such two elements are found, afterwards n buckets of n 1 total buckets are filled with one element. And all those elements are sorted. We once more go through all the buckets and compare the distance of the content of neighbouring buckets only, whether there are two elements, which fulfil the condition. Due to the width of the buckets, the condition cannot be true for buckets, which are not adjoining: For those the distance is always |x-y| > w.

(The fulfilment of the last inequality in 4. is also the reason, why the interval is half-open and cannot be closed, and why we need n 1 buckets instead of n. An alternative would be, to use n buckets and make the now last bucket a special case with [M; M w]. But O(n 1)=O(n) and using n 1 steps is preferable to special casing the last bucket.)

The running time is O(n) for step 1, 0 for step 2 - we actually do not do anything there, O(n) for step 3 and O(n) for step 4, as there is only 1 element per bucket. Altogether O(n).

This task shows, that either sorting of elements, which are not close together or coarse sorting without considering fine distances can be done in O(n) instead of O(n*log(n)). It has useful applications. Numbers on computers are discrete, they have a finite precision. I have sucessfuly used this sorting method for signal-processing / fast sorting in real-time production code.

About @Damien 's remark: The real threshold of (M-m)/(n-1) is provably true for every such sequence. I assumed in the answer so far the sequence we are looking at is a special kind, where the stronger condition is true, or at least, for all sequences, if the stronger condition was true, we would find such elements in O(n).

If this was a small mistake of the OP instead (who said to have proven the stronger condition) and we should find two elements x, y with |x-y| <= (M-m)/(n-1) instead, we can simplify:

-- 3. We would do steps 1 to 3 like above, but with n buckets and the bucket width set to w = (M-m)/(n-1). The bucket n now goes from [M; M w[.

For step 4 we would do the following alternative:

4./alternative: n buckets are filled with one element each. The element at bucket n has to be M and is at the left boundary of the bucket interval. The distance of this element y = M to the element x in the n-1th bucket for every such possible element x in the n-1thbucket is: |M-x| <= w = (M-m)/(n-1), so we found x and y, which fulfil the condition, q.e.d.