I implemented an iterative version of binary search and I'd like you to tell me what you would do differently and why. You're certainly welcome to critique style (e.g., header style, brackets, comments, etc.), but that's not really what I'm looking for.
I would like to know, for example, how you would handle large arrays that would cause overflow errors with the indexes (I know, use a different data type, but would you only use one version with long int, unsigned lont int or whatever or would you have multiple and insert based on worst case). Do you see any edge cases that this implementation would fail? Could you make it faster? etc.
Thanks,
int binary_search_iterative(const int *arr,
const int arr_size,
const int target)
{
/*
Performs iterative binary search on a sorted integer array.
Parameters
----------
arr : int*
A sorted integer array.
arr_size : int
Size of input array.
target : int
Value to look for in input array.
Returns the index at which the value is found or -1.
*/
int low = 0;
int high = arr_size - 1;
while (low <= high)
{
int mid = (low high) / 2;
if (arr[mid] < target) // Ignore left half
{
low = mid 1;
}
else if (tab[mid] > target) // Ignore right half
{
high = mid - 1;
}
else
{
return mid; // Found at mid
}
}
// Not found -> return default value
return -1;
}
CodePudding user response:
Logarithmic speed is fine. You have a typo tab
/arr
.
Use an other name for target: search_key
or such.
Indexes are normally size_t
, but int
is fine, for the failure with -1
.
Put arr[mid]
in a variable; though the compiler will optimize it as such.
There is a possible speed improvement (which I would not use however). At every loop step, instead of halving the search range, you can do a weighted split.
Say you search 10, at low is 5, at high is 50000. Then you could take a logarithmic 500 as new mid, which saves about 8 iterations.
return ~low;
Would return the negative ones-complement of the insert position. When the search fails one can then easily insert the missing key.
There also exists the Dijkstra convention where the higher bound is exclusive, initially the size. This saves one subtraction for the new mid. So less code, and a tiny bit faster. Java uses that.
CodePudding user response:
Your implementation has a flaw:
computing the mid point with
int mid = (low high) / 2;
causes an arithmetic overflow iflow high
exceedsINT_MAX
. This will happen for large arrays with a length greater thanINT_MAX / 2
. It the searched value is toward the end of the array, computingmid
will invoke undefined behavior, most likely produce a negative value, and dereferencingarr[mid]
also invokes undefined behavior, possibly a segmentation fault. You should instead use this:int mid = low (high - low) / 2;
also note the type in
tab[mid]
which should bearr[mid]
.
There are also some minor considerations:
using signed index values is necessary for your implementation because you might get a negative index value at
high = mid - 1;
, which will cause the loop to terminate without further issues.yet a signed type for
high
andlow
produces suboptimal code inint mid = low (high - low) / 2;
where right shifting must be adjusted for the potential case of negative values. As the value divided by2
is always positive, you could avoid the adjustment with:int mid = low ((high - low) >> 1); // more efficient but less elegant
the loop uses 2 comparisons in each iteration, reducing the span of the slice one element below half the current span and producing early exit when finding the target, yet it uses almost 50% too many comparisons when the element is not found and it does not return the minimal index when successful. Note also that using 2 tests and a
return
statement prevents the compiler from generating branchless code. The first comparison is likely mispredicted, resulting in suboptimal runtimes on many CPUs.
A simpler loop avoids these shortcomings and allows for a more appropriate unsigned index type size_t
:
int binary_search_iterative(const int *arr, int length, int target)
{
/*
Performs iterative binary search on a sorted integer array.
Parameters
----------
arr : int*
A sorted integer array.
length : int
Number of elements of input array.
target : int
Value to look for in input array.
Returns the index at which the value is found or -1.
*/
size_t low = 0;
size_t high = length;
while (low < high) {
size_t mid = low (high - low) / 2; // uses a simple right shift
//single test, produces branchless code
if (arr[mid] <= target) {
low = mid;
} else {
high = mid;
}
}
if (length > 0 && arr[low] == target) {
// return lowest index for a match
return low;
} else {
// no match. low is the insertion point
// could return ~low to indicate insertion point
return -1;
}
}