Home > Software design >  C function to check for duplication in an sorted array [without any additional array]
C function to check for duplication in an sorted array [without any additional array]

Time:01-03

I am doing some homework and i stuck in a section where i need to move all the duplicates in a sorted array to the end of the array. I know how to do it with 1 additional array but without additional array i dont know(a bit confusing pointers)

i tried this idea: (but its not working and no idea why)


int main()
{
    int newArr[12] =  {1,2,2,2,3,3,5,6,7,7,9,9 };
    int n = 12;
    int first = 0;
    int last = n - 1;
    int* D = arr   1;
    int j = 0;
        for (int i = 0; i < n; i  )
    {
        j = i   1;

        while (j < n)
        {
            if (newArr[i] != *D)
            {
                D  ;
            }
            if (newArr[i] == *D)
            {
                swap(&newArr[i], &newArr[last]);
                j  ;
                last--;
            }
            j  ;
        }
        
            
    }
    for (int i = 0; i < n; i  )
    {
        printf("%d", newArr[i]);
    }
}

the working 2 array version:

int moveDuplicatesV1(int* arr, int n)
{
    int* seen_before = (int*)calloc(n * 2   1, sizeof(int));
    assert(seen_before);
    int val = 0, count = 0, flag = 1;
    int j = 0;
    for (int i = 0; i < n; i  )
    {
        val = arr[i];
        if (seen_before[arr[i]   n] == 0)
        {
            seen_before[arr[i]   n]  ;
            count  ;
            continue;

        }
        else if (flag)
        {
            j = i   1;
            flag = 0;
        }
        while (j < n)
        {
            if (seen_before[arr[j]   n] == 0)
            {
                count  ;
                seen_before[arr[j]   n]  ;
                swap(&arr[i], &arr[j]);
                j  ;
                if (j == n)
                {
                    free(seen_before);
                    return count;
                }
                break;
            }
            /*break;*/
            j  ;
            if (j == n)
            {
                free(seen_before);
                return count;
            }
        }
    }
}

CodePudding user response:

Consider your array:

  0   1   2   3   4   5   6   7   8   9   10  11
 --- --- --- --- --- --- --- --- --- --- --- --- 
| 1 | 2 | 2 | 2 | 3 | 3 | 5 | 6 | 7 | 7 | 9 | 9 |
 --- --- --- --- --- --- --- --- --- --- --- --- 

It sounds like you're trying to get to:

  0   1   2   3   4   5   6   7   8   9   10  11
 --- --- --- --- --- --- --- --- --- --- --- --- 
| 1 | 2 | 3 | 5 | 6 | 7 | 9 | 2 | 2 | 3 | 7 | 9 |
 --- --- --- --- --- --- --- --- --- --- --- --- 

If we start at index 0 and compare to index 1, they are not the same, so we move on to comparing index 1 and index 2.

They are the same. So we increment a dupes_found variable from 0 to 1. We save arr[i] and move the rest of the array forward 1 position, then copy what was in arr[i] into the last spot in the array.

  0   1   2   3   4   5   6   7   8   9   10  11
 --- --- --- --- --- --- --- --- --- --- --- --- 
| 1 | 2 | 2 | 3 | 3 | 5 | 6 | 7 | 7 | 9 | 9 | 2 |
 --- --- --- --- --- --- --- --- --- --- --- --- 

Because we've moved the array, we don't move forward. We now compare indices 1 and 2 again. They are still the same, so we repeat, having again incremented dupes_found.

  0   1   2   3   4   5   6   7   8   9   10  11
 --- --- --- --- --- --- --- --- --- --- --- --- 
| 1 | 2 | 3 | 3 | 5 | 6 | 7 | 7 | 9 | 9 | 2 | 2 |
 --- --- --- --- --- --- --- --- --- --- --- --- 

When we're at the length of the array minus the dupes_found, the job is complete.

An implementation might look something like:

void relocate_duplicates_to_end(int *arr, size_t n) {
    size_t dupes_found = 0;

    for (size_t i = 0; i < n - 1 - dupes_found;) {
        if (arr[i] == arr[i 1]) {
            dupes_found  ;
            int temp = arr[i];
            memmove(&arr[i], &arr[i 1], sizeof(int) * (n - i - 1));
            arr[n - 1] = temp;
        }
        else {
            i  ;
        }
    }
}

CodePudding user response:

Given that the array is ordered, the duplicates are adjacent, so you need only compare the current element with its previous neighbour.

Then if they match you:

  • copy the duplicate,
  • shift the array to the right of the duplicate left by one element,
  • write the duplicate to the end.

In that case the index already points to the next element to inspect because it was shifted into the current position.

Otherwise - if no match occurs, you increment the element index to inspect the next value.

The search terminates when you either get to the end of the array or the next element is smaller then the previous element (because it was moved to the end earlier as a duplicate and implicitly either smaller or equal in value.

For example:

#include <stdio.h>
#include <string.h>

int main()
{
    int a[] =  {1,2,2,2,3,3,5,6,7,7,9,9} ;
    size_t n = sizeof(a) / sizeof(*a) ;
    
    size_t i = 1 ;
    while( i < n &&             // while: not end of array and
           a[i-1] <= a[i] )     //        next is not less than than previous
                                //        i.e. not yet run into the moved duplicates
    {
        // If next is a duplicate of previous...
        if( a[i-1] == a[i] )    
        {
            // Copy next
            int dup = a[i] ;
            
            // Shift right of duplicate left by one element
            memmove( &a[i],                       // Duplicate element position
                     &a[i 1],                     // Right of duplicate position
                     (n - i - 1) * sizeof(*a) ) ; // Elements to the right of duplicate
            
            // Put the duplicate at the end
            a[n-1] = dup ;
        }
        else
        {
            // Next element
            i   ;
        }
    }


    for (int i = 0; i < n; i  )
    {
        printf("%d", a[i]);
    }
}

Output:

12345678988

There is scope for performing fewer moves by counting a run of duplicates and doing a single move for each run, but unless you are expecting large arrays with long sequences of duplicates it may not be worth the added complexity.

CodePudding user response:

@Chris drew some pretty pictures for you and I wrote the matching implementation. Starting at position 1 instead of 0 is a minor refactoring.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*
 [0; i[: unique values
 [i; n - d[: possible duplicates (yet to be processed)
 [n-d; n[: duplicates

 return pointer to first duplicate value.  Note this is
 1 past the end of a if there are no duplicates.  This is
 ok just don't dereference it before checking.
*/
int *partition(size_t n, int a[n]) {
    size_t d = 0;
    for(size_t i = 1; i < n-d;) {
        if(a[i-1] == a[i]) {
            int tmp = a[i];
            memmove(a i, a i 1, (n-i-1) * sizeof(*a));
            a[n-1] = tmp;
            d  ;
            continue;
        }
        i  ;
    }
    return a   n - d;
}

void print_a(size_t n, int a[n]) {
    for(size_t i = 0; i < n; i  ) {
        printf("%d%s", a[i], i   1 < n ? ", " : "\n");
    }
}

int main() {
    int a[] = {1,2,2,2,3,3,5,6,7,7,9,9};
    size_t n= sizeof a / sizeof *a;
    partition(n, a);
    print_a(n, a);
}

and the output is:

1, 2, 3, 5, 6, 7, 9, 2, 2, 3, 7, 9

You could elect to move n - d - i - 2 elements and insert the new duplicate element at a[d] (instead of a[n-1]). This will reverse sort the duplicate partition. If you care then reverse the duplication partition before you return. It will probably be a little faster.

Another optimization would to move a run of (r - 1) duplicate values by copying the first into tmp as above, create a (r - 1) hole with memmove() then write (r-1) copies of tmp in that hole (either at a[d-r-1] or a[n-r-1]).

  •  Tags:  
  • c
  • Related