Binary Search in JS: trying to find a consistent mental model-CodePudding

I am grinding leetcode these days and I encountered this question about using binary search to find a peak element in an array: https://leetcode.com/problems/find-peak-element/

I know we can think of the array as alternating ascending and descending sequences. Here is my solution

var findPeakElement = function(nums) {
    if(nums.length <= 1) return 0
    let left = 0, right = nums.length - 1
    
    while(left <= right) {
        const mid = left   right >>> 1
        if(nums[mid] > nums[mid   1]) {
            right = mid - 1
        } else {
            left = mid   1
        }
    }
    
    
    return left === nums.length ? left - 1 : left
};

If the nums[mid] is bigger than the next element in the array that we know we are in the descending sub array and the peak element must be lying in the left, and vice versa if then nums[mid] is smaller than the next element. So far so good. But what confused me is which index I should return eventually - left or right? To figure this out I need to go through a bunch of trial and error.

And if I slightly tweek the question to find the valley element e.g. [1, 3, 20, 4, 1, 0]'s valley elements should be 0. While I can reason about how we narrow the window but I still cannot seem to figure out which index I should return at the end of the binary search.

Here is my attempt for returning the valley element in the array by mirroring what I did for findPeakElement

var findValleyElement = function (nums) {
  if (nums.length <= 1) return 0
  let left = 0,
    right = nums.length - 1

  while (left <= right) {
    const mid = (left   right) >>> 1
    if (nums[mid] > nums[mid   1]) {
      left = mid   1
    } else {
      right = mid - 1
    }
  }

  return right
}

But this time I cannot use right as the returned index. I need to use left instead. I cannot seem to think of a consistent way of thinking through this without going through a bunch of examples, which is really not ideal since you still might miss some edge cases.

So my question is, is there some consistent mental model we can adopt when thinking about these binary search problems, specifically which index we should return to satisfy the requirements.

CodePudding user response：

When the following condition is true:

if(nums[mid] > nums[mid   1]) {

...then it could be that mid is a solution, maybe even the only one. So that means you shouldn't exclude it from the range, yet with right = mid - 1 you do exclude it. You should set right = mid. To then avoid a potentially endless loop, the loop condition should be left < right. This will ensure the loop will always end: the range is guaranteed to become smaller in each iteration.

Once the loop exits, left has become equal to right. The only possible index in that range (of 1) is that index.

There is now no more need to check whether left is nums.length, as this cannot happen: with our chosen while condition, left can never become greater than right, ... only equal to it. And since right is a valid index, no such out-of-range check is needed.

Also the case of array size 1 does not need special treatment now:

So:

var findPeakElement = function(nums) {
    let left = 0,
        right = nums.length - 1;

    while (left < right) {
        const mid = (left   right) >>> 1;
        if (nums[mid] > nums[mid   1]) {
            right = mid;
        } else {
            left = mid   1;
        }
    }

    return left;
};

CodePudding user response：

A peak is defined as any element whose neighbours are both less than the element. In the example below, there are are two peaks, 5 and 4 -

        5,
          4,        4,
      3,          3,  3,
    2,      2,  2,      2 ]
[ 1,          1,

So we can take 3 elements off the input, a, b, and c and -

if any a, b, or c is null, a valid comparison cannot be made and therefore there is no peak. stop the program
otherwise if a < b and b > c, a peak has been found. output the peak
finally drop a, and recur on the same input to search for additional peaks

That would look something like this -

function* peaks ([ a, b, c, ...more ]) {
  if (a == null || b == null || c == null) return // 1
  if (a < b && b > c) yield b                     // 2
  yield *peaks([ b, c, ...more ])                 // 3
}

for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
  console.log("found peak", p)

found peak 2
found peak 5
found peak 7

If you have a significantly large input, which I'm sure LeetCode will give you, handling arrays like this will create an enormous amount of wasteful intermediate values. A better approach would be to use an index, i -

function* peaks (t, i = 0) {
  let a = t[i], b = t[i   1], c = t[i   2]
  if (a == null || b == null || c == null) return // 1
  if (a < b && b > c) yield b                     // 2
  yield *peaks(t, i   1)                          // 3
}

for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
  console.log("found peak", p)

found peak 2
found peak 5
found peak 7

And finally the use of recursion will limit the size of input that this program can handle. We can use a for loop to avoid any recursion limits -

function* peaks (t) {
  let a, b, c
  for (let i = 0; i<t.length; i  ) {
    a = t[i], b = t[i   1], c = t[i   2]
    if (a == null || b == null || c == null) return // 1
    if (a < b && b > c) yield b                     // 2
  }
}

for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
  console.log("found peak", p)

found peak 2
found peak 5
found peak 7

In the last two example we perform three array lookups per step, t[i], t[i 1], and t[i 2]. As an optimization we can reduce this to just a single lookup -

function* peaks (t) {
  let a, b, c
  for (let i = 0; i<t.length; i  ) {
    a = b, b = c, c = t[i]
    if (a == null || b == null || c == null) continue
    if (a < b && b > c) yield b
  }
}

for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
  console.log("found peak", p)

found peak 2
found peak 5
found peak 7

This works because our program effectively shifts the elements through a, b, and c in a "leftward" direction. Note the peaks in the b column -

a	b	c	...
null	null	1	2,1,3,4,5,4,2,1,5,6,7,4
null	1	2	1,3,4,5,4,2,1,5,6,7,4
1	2 (peak)	1	3,4,5,4,2,1,5,6,7,4
2	1	3	4,5,4,2,1,5,6,7,4
1	3	4	5,4,2,1,5,6,7,4
3	4	5	4,2,1,5,6,7,4
4	5 (peak)	4	2,1,5,6,7,4
5	4	2	1,5,6,7,4
4	2	1	5,6,7,4
2	1	5	6,7,4
1	5	6	7,4
5	6	7	4
6	7 (peak)	4

With our optimised program, we can drop several other unnecessary actions. Index i is no longer needed and we can skip having to worry about off-by-one errors caused by i and comparisons of i<t.length. Additionally, we can skip the c == null check as c will always represent an element of the input array -

function* peaks (t) {
  let a, b, c
  for (const v of t) {
    a = b, b = c, c = v
    if (a == null || b == null) continue
    if (a < b && b > c) yield b
  }
}

for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
  console.log("found peak", p)

found peak 2
found peak 5
found peak 7

If you want to collect all peaks, you can use Array.from to convert any iterable into an array -

const allPeaks = peaks([1,2,1,3,4,5,4,2,1,5,6,7,4])
console.log(allPeaks)

[2, 5, 7]

Generators are a good fit for this kind of problem because they can be paused/canceled at any time, ie after the first peak is found -

const firstPeak (t) {
  for (const p of peaks(t))
    return p                 // <- immediately stops `peaks`
}

firstPeak([1,2,1,3,4,5,4,2,1,5,6,7,4])

If however you want to write firstPeaks without using a generator, there's nothing stopping you from doing so. Instead of using yield you can simply return -

function firstPeak (t) {
  let a, b, c
  for (const v of t) {
    a = b, b = c, c = v
    if (a == null || b == null) continue
    if (a < b && b > c) return b          // <-
  }
}

console.log("first peak", firstPeak([1,2,1,3,4,5,4,2,1,5,6,7,4]))

first peak 2

CodePudding user response：

Such algorithms are usually expressed better on the semi-open ranges. In other words, right should be an invalid index, one-off beyond the range. With such convention in mind, consider a rewrite:

let left = 0, right = nums.length

while(right - left > 2) {
    const mid = left   right >>> 1
    if(nums[mid] > nums[mid   1]) {
        right = mid
    } else {
        left = mid   1
    }
}
return left

The loop finishes when right - left is 1; that is only one element remains in the range, and it is the one to be returned. Since right does not belong to the range to begin with, and idea to returning it would never cross your mind.

For the reference, this is a standard convention in STL.