Home > Software design >  Safety concerns regarding Binary Search
Safety concerns regarding Binary Search

Time:12-14

The binary search algorithm is pretty cool. Here's the basic, recursive implementation in C that I see everywhere on the internet:


int binsearch(int arr[], int l, int h, int key){

    if(l < h){

        int mid = (l h)/2;

        if(key == arr[mid])
            return mid;
        else if(key < arr[mid])
            return binsearch(arr, l, mid-1, key);
        else
            return binsearch(arr, mid 1, h, key);

    }else{

        return -1; /*if key isn't found, -1 is a sign of failure*/

    }

}

/* calling and printing in main */

printf("%d\n", binsearch(array, 0, n-1, my_key) );

Although there may seem nothing wrong with this, when I run this, it doesn't work for marginal elements of the array (i.e. arr[0] and arr[n-1]), returning -1. Here's what worked for me:

#include <stdio.h>

int A[] = {-6, -5, -3, -1, 0, 4, 5, 9, 12, 13}; /* don't count, n = 10 */
int k;

int binsearch(int s, int e, int n){

    if(s < e){

        int mid = (s e)/2;
        
        if(n == A[mid]){

            return mid;

        }else if(n < A[mid]){

            return binsearch(s, mid, n); /* change: mid-1 */

        }else{

            return binsearch(mid, e, n); /* change: mid 1 */

        }

    }else{

        return -1;

    }

}

int main(){

    printf("key: "); scanf("%d", &k);

    printf("Place in array: %d\n", binsearch(0, 10, k)); /* change: n-1 to n */

}

My question: are these changes unsafe to memory (for example, is the function accessing memory out of array, as it should be 9, not 10) or inefficient? Is there something I'm missing that wouldn't have required these changes?

Thanks in advance!

CodePudding user response:

The code copied from the net (i.e. the first code block) is obviously wrong. It can never work for an array with size 1. I think there is a simple typo. Try this change:

if(l < h){   --->  if(l <= h){

I think it will fix the first code block.

Your rewrite of the code doesn't work either. Try running it on this array:

 int A[] = {-6};

and do the search like:

printf("Place in array: %d\n", binsearch(0, 1, 10));

Now you have an endless loop...

CodePudding user response:

You changed the semantics of e from index of last element to *index to one past the end. Per se this is safe, even a pointer to one past the end of any array is a valid pointer, though you must not dereference it, this would yield undefined behaviour. C , by the way, lives from this priciple, any standard algorithm operates on two pointers to the beginning and one past the end respectively (well, actually iterators, an extended concept, but pointers are such ones, too).

There's an issue in your variant, though: The change from mid 1 to mid for the second recursive call was wrong. The semantics of s as begin of the search range hasn't changed. The value at mid already has been tested, so you don't need to include it as start value for the remaining array either. Even worse: If s and e differ only by 1 then mid will be equal to s, and if you are looking for a value smaller then you will recurse with the same values for s and e, ending up in endless recursion!

In contrast the original variant had an error in as well:

As l and h mark both a point within the array the case of l == h denotes an array of length 1. The mid value in this case still needs to be checked, so the condition in the original variant actually needs to be

if(l <= h)
//    ^ (!)

to be able to find the end value, too.

CodePudding user response:

Another corner problem in both of OP's functions.

Overflow

int mid = (l h)/2; risks overflow in l h.

Given -1 <= l <= h, a non-overflow solution is

int mid = l   (h-l)/2;

This also works when l, h are some unsigned type.

Type

Array indexes may exceed the int range. Use size_t to handle even the largest of arrays.

Recall size_t is an unsigned type and so the algorithm needs to guard against mid-1 when mid == 0. The classic approach if not not specify the last index, but the last index 1 and adjust calculations and tests accordingly.


Alterative

Some untested code incorporating the above to give OP an idea.

/*
 * Return a pointer to the matching array element or NULL when not found. 
 * nmemb is the number of elements in the array.  May be 0 to SIZE_MAX.
 */
int *binsearch_int(int key, const int *base, size_t nmemb) {
  while (nmemb > 0) {
    size_t mid = nmemb/2;
    if(key == base[mid]) {
      return &base[mid];
    }

    if(key < base[mid]) {
      nmemb = mid;
    } else {
      base  = mid   1;
      nmemb -= mid   1;
    }
  }
  return NULL;
}
  • Related