Critique my iterative binary search implementation in C-CodePudding

I implemented an iterative version of binary search and I'd like you to tell me what you would do differently and why. You're certainly welcome to critique style (e.g., header style, brackets, comments, etc.), but that's not really what I'm looking for.

I would like to know, for example, how you would handle large arrays that would cause overflow errors with the indexes (I know, use a different data type, but would you only use one version with long int, unsigned lont int or whatever or would you have multiple and insert based on worst case). Do you see any edge cases that this implementation would fail? Could you make it faster? etc.

Thanks,

int binary_search_iterative(const int *arr,
                            const int arr_size,
                            const int target) 
{
    /*
    Performs iterative binary search on a sorted integer array. 
    
    Parameters
    ----------
    arr : int*
        A sorted integer array.
    
    arr_size : int
        Size of input array.
        
    target : int
        Value to look for in input array.
        
    Returns the index at which the value is found or -1.
    */
    
    int low = 0;
    int high = arr_size - 1;
    
    while (low <= high)
    {
        int mid = (low   high) / 2;
        
        if (arr[mid] < target)      // Ignore left half
        {
            low = mid   1;
        }
        else if (tab[mid] > target) // Ignore right half
        {
            high = mid - 1;
        }
        else
        {
            return mid;             // Found at mid
        }
    }
    
    // Not found -> return default value
    return -1;
}

CodePudding user response：

Logarithmic speed is fine. You have a typo tab/arr.

Use an other name for target: search_key or such.

Indexes are normally size_t, but int is fine, for the failure with -1.

Put arr[mid] in a variable; though the compiler will optimize it as such.

There is a possible speed improvement (which I would not use however). At every loop step, instead of halving the search range, you can do a weighted split.

Say you search 10, at low is 5, at high is 50000. Then you could take a logarithmic 500 as new mid, which saves about 8 iterations.

return ~low;

Would return the negative ones-complement of the insert position. When the search fails one can then easily insert the missing key.

There also exists the Dijkstra convention where the higher bound is exclusive, initially the size. This saves one subtraction for the new mid. So less code, and a tiny bit faster. Java uses that.

CodePudding user response：

Your implementation has a flaw:

computing the mid point with int mid = (low high) / 2; causes an arithmetic overflow if low high exceeds INT_MAX. This will happen for large arrays with a length greater than INT_MAX / 2. It the searched value is toward the end of the array, computing mid will invoke undefined behavior, most likely produce a negative value, and dereferencing arr[mid] also invokes undefined behavior, possibly a segmentation fault. You should instead use this:
```
  int mid = low   (high - low) / 2;
```
also note the type in tab[mid] which should be arr[mid].

There are also some minor considerations:

using signed index values is necessary for your implementation because you might get a negative index value at high = mid - 1;, which will cause the loop to terminate without further issues.
yet a signed type for high and low produces suboptimal code in int mid = low (high - low) / 2; where right shifting must be adjusted for the potential case of negative values. As the value divided by 2 is always positive, you could avoid the adjustment with:
```
  int mid = low   ((high - low) >> 1);  // more efficient but less elegant
```
the loop uses 2 comparisons in each iteration, reducing the span of the slice one element below half the current span and producing early exit when finding the target, yet it uses almost 50% too many comparisons when the element is not found and it does not return the minimal index when successful. Note also that using 2 tests and a return statement prevents the compiler from generating branchless code. The first comparison is likely mispredicted, resulting in suboptimal runtimes on many CPUs.

A simpler loop avoids these shortcomings and allows for a more appropriate unsigned index type size_t:

int binary_search_iterative(const int *arr, int length, int target) 
{
    /*
    Performs iterative binary search on a sorted integer array. 
    
    Parameters
    ----------
    arr : int*
        A sorted integer array.
    
    length : int
        Number of elements of input array.
        
    target : int
        Value to look for in input array.
        
    Returns the index at which the value is found or -1.
    */
    
    size_t low = 0;
    size_t high = length;
    
    while (low < high) {
        size_t mid = low   (high - low) / 2;  // uses a simple right shift
        
        //single test, produces branchless code
        if (arr[mid] <= target) {
            low = mid;
        } else {
            high = mid;
        }
    }
    if (length > 0 && arr[low] == target) {
        // return lowest index for a match
        return low;
    } else {
        // no match. low is the insertion point
        // could return ~low to indicate insertion point
        return -1;
    }
}