Home > Enterprise >  Given an array of integers of size n 1 consisting of the elements [1,n]. All elements are unique exc
Given an array of integers of size n 1 consisting of the elements [1,n]. All elements are unique exc

Time:06-04

I have been attempting to solve the following problem:

You are given an array of n 1 integers where all the elements lies in [1,n]. You are also given that one of the elements is duplicated a certain number of times, whilst the others are distinct. Develop an algorithm to find both the duplicated number and the number of times it is duplicated.

Here is my solution where I let k = number of duplications:

struct LatticePoint{ // to hold duplicate and k
   int a;
   int b;
   LatticePoint(int a_, int b_) : a(a_), b(b_) {}
   }
   
LatticePoint findDuplicateAndK(const std::vector<int>& A){

int n = A.size() - 1;
std::vector<int> Numbers (n);

for(int i = 0; i < n   1;   i){
     Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
   }
   
int i = 0;
while(i < n){
   if(Numbers[i] > 1) {
      int duplicate = i   1;
      int k = Numbers[i] - 1;
      LatticePoint result{duplicate, k};
      return LatticePoint;
}

So, the basic idea is this: we go along the array and each time we see the number A[i] we increment the value of Numbers[A[i]]. Since only the duplicate appears more than once, the index of the entry of Numbers with value greater than 1 must be the duplicate number with the value of the entry the number of duplications - 1. This algorithm of O(n) in time complexity and O(n) in space.

I was wondering if someone had a solution that is better in time and/or space? (or indeed if there are any errors in my solution...)

CodePudding user response:

The idea of your code works.

But, thanks to the n 1 elements, we can achieve other tradeoffs of time and space.

If we have some number of buckets we're dividing numbers between, putting n 1 numbers in means that some bucket has to wind up with more than expected. This is a variant on the well-known pigeonhole principle.

So we use 2 buckets, one for the range 1..floor(n/2) and one for floor(n/2) 1..n. After one pass through the array, we know which half the answer is in. We then divide that half into halves, make another pass, and so on. This leads to a binary search which will get the answer with O(1) data, and with ceil(log_2(n)) passes, each taking time O(n). Therefore we get the answer in time O(n log(n)).

Now we don't need to use 2 buckets. If we used 3, we'd take ceil(log_3(n)) passes. So as we increased the fixed number of buckets, we take more space and save time. Are there other tradeoffs?

Well you showed how to do it in 1 pass with n buckets. How many buckets do you need to do it in 2 passes? The answer turns out to be at least sqrt(n) bucekts. And 3 passes is possible with the cube root. And so on.

So you get a whole family of tradeoffs where the more buckets you have, the more space you need, but the fewer passes. And your solution is merely at the extreme end, taking the most spaces and the least time.

CodePudding user response:

You can reduce the scratch space to n bits instead of n ints, provided you either have or are willing to write a bitset with run-time specified size (see boost::dynamic_bitset).

You don't need to collect duplicate counts until you know which element is duplicated, and then you only need to keep that count. So all you need to track is whether you have previously seen the value (hence, n bits). Once you find the duplicated value, set count to 2 and run through the rest of the vector, incrementing count each time you hit an instance of the value. (You initialise count to 2, since by the time you get there, you will have seen exactly two of them.)

That's still O(n) space, but the constant factor is a lot smaller.

  • Related