Home > Net >  Finding Number of duplicates in an array
Finding Number of duplicates in an array

Time:03-07

I have an array, say 1,3,3,1,2 The output of the code must be 4(2 repetitions of 1 2 repetitions of 3=4). How can I do this in C? Here's my attempt.

#include <stdio.h>
int main(){
int n,i,j,temp;
scanf("%d",&n);
int arr[n];
for(i=0;i<n;i  ){
  scanf("%d",&arr[i]);
}
for(i=0;i<n;i  ){
  int min = i;
  for(j=i 1;j<n;j  ){
    if(arr[j]<arr[min]) min=j;
  }
  temp= arr[min];
  arr[min]=arr[i];
  arr[i]=temp;
  
}
  int count=1;
  for(i=0;i<n;i  ){
    if(arr[i]==arr[i 1])count  ;
    else continue;
  }
  printf("%d",count);
}

CodePudding user response:

It looks like your loop has a couple of problems.

  1. It indexes past the end of the array, which is undefined behavior
  2. It doesn't understand when to count the first item in a group of duplicates

Regarding #1 it's nice to just start your loop at 1 instead of 0 and then check index i-1 against i.

Regarding #2 your code works but only when there's only one number that has duplicates. This is because you started the count at 1. However, when you encounter another group, that assumption breaks down. The simplest way is to just record whether you're starting a new group.

Let's put this all together:

int count = 0;
int first = 1;
for(i = 1; i < n; i  ) {
    if (arr[i-1] == arr[i]) {
        count  = first   1;
        first = 0;
    } else {
        first = 1;
    }
}

As for the sorting step, it's using a wildly inefficient algorithm. This is fine for small datasets, but you'll have a problem if you have a very large number of inputs. It would be wise to use something like qsort instead. There are many examples out there for how to do this.

So, your runtime right now is O(N^2). With quicksort it becomes O(N.logN).

You can probably reduce runtime further with something like a hash table that simply stores how many of each value you've found, which you update as they arrive.

If your data ranges are well-defined and small enough, you might also benefit from using a large array instead of a hash table and store a single bit for each possible number representing when a number is seen. Actually for your case you'd need two of these because of the "first in the group" problem. Now, each number that arrives sets the "seen" bit. If it was already seen, set the "duplicate" bit and increment the count. If the "duplicate" bit is not set, increment the count. Now you pretty much guarantee blazing-fast O(N) runtime where testing for and counting a duplicate value is O(1).

CodePudding user response:

What you need is to change this for loop.

int count=1;
for(i=0;i<n;i  ){
  if(arr[i]==arr[i 1])count  ;
  else continue;
}

It can look for example the following way

int count = 0;

for ( i = 0; i < n; )
{
    int j = i;
    while (   i < n && arr[i-1] == arr[i] );

    if ( !( i - j < 2 ) ) count  = i - j;
}
  • Related