Home > Software engineering >  Using binary search to find the duplicate number in an array
Using binary search to find the duplicate number in an array

Time:10-24

The problem:

Given an array of integers nums containing n 1 integers where each integer is in the range [1, n] inclusive.

There is only one repeated number in nums, return this repeated number.

You must solve the problem without modifying the array nums and uses only constant extra space.

Here is one of the possible solution using binary search

class Solution(object):
    def findDuplicate(self, nums):
        beg, end = 1, len(nums)-1
        
        while beg   1 <= end:
            mid, count = (beg   end)//2, 0
            for num in nums:
                if num <= mid: count  = 1        
            if count <= mid:
                beg = mid   1
            else:
                end = mid
        return end

Example 1:

Input: nums = [1,3,4,2,2]
Output: 2
Example 2:

Input: nums = [3,1,3,4,2]
Output: 3

Can someone please explain this solution for me? I understand the code but I don't understand the logic behind this. In particular, I do not understand how to construct the if statements (lines 7 - 13). Why and how do you know that when num <= mid then I need to do count = 1 and so on. Many thanks.

CodePudding user response:

The solution keeps halving the range of numbers the answer can still be in.

For example, if the function starts with nums == [1, 3, 4, 2, 2], then the duplicate number must be between 1 and 4 inclusive by definition.

By counting how many of the numbers are smaller than or equal to the middle of that range (2), you can decide if the duplicate must be in the upper or lower half of that range. Since the actual number is greater (3 numbers are lesser than or equal to 2, and 3 > 2), the number must be in the lower half.

Repeating the process, knowing that the number must be between 1 and 2 inclusive, only 1 number is less than or equal to the middle of that range (1), which means the number must be in the upper half, and is 2.

Consider a slightly longer series: [1, 2, 5, 6, 3, 4, 3, 7]. Between 1 and 7 lies 3, 4 numbers are less than or equal to 3, so the number must be between 1 and 3. Between 1 and 3 lies 2, 2 numbers are less than or equal to 2, so the number must be over 2, which leaves 3.

The solution will iterate over all n elements of nums a limited number of times, since it keeps halving the search space. It's certainly more efficient than the naive:

    def findDuplicate(self, nums):
        for i, n in enumerate(nums):
            for j, m in enumerate(nums):
                if i != j and n == m:
                    return n

But as user @fas suggests in the comments, this is better:

    def findDuplicate(nums):
        p = 1
        while p < len(nums):
            p <<= 1
        r = 0
        for n in nums:
            r ^= n
        for n in range(len(nums), p):
            r ^= n
        return r

CodePudding user response:

This is how given binary search works. Inside binary search you have implementation of isDuplicateLessOrEqualTo(x):

mid, count = (beg   end)//2, 0
for num in nums:
  if num <= mid: count  = 1        
if count <= mid:
  return False  # In this case there are no duplicates less or equal than mid. 
                # Actually count == mid would be enough, count < mid is impossible.
else:
  return True   # In this case there is a duplicate less or equal than mid.

isDuplicateLessOrEqualTo(x) is a non-decreasing function (assume x has a duplicate, then for all i < x it's false and for all i >= x it's true), that's why you can run binary search over it.

Each iteration you run through the array, which gives you overall complexity O(n log n) (where n is size of array).

There's a faster solution. Note that xor(0..(2^n)-1) = 0 for n >= 2, because there are 2^(n-1) ones for each bit position and it's an even number, for example:

0_10 = 00_2
1_10 = 01_2
2_10 = 10_2
3_10 = 11_2
       ^
     2 ones here, 2 is even
        ^
      2 ones here, 2 is even

So by xor-ing all the numbers you'll receive exactly your duplicate number. Let's write it:

class Solution(object):
  def nearestPowerOfTwo(number):
    lowerBoundDegreeOfTwo = number.bit_length()
    lowerBoundDegreeOfTwo = max(lowerBoundDegreeOfTwo, 2)
    return 2 ** lowerBoundDegreeOfTwo

  def findDuplicate(self, nums):
    xorSum = 0
    for i in nums:
      xorSum = xorSum ^ i
    for i in range(len(nums), nearestPowerOfTwo(len(nums) - 1)):
      xorSum = xorSum ^ i
    return xorSum

As you can see that gives us O(n) complexity.

  • Related