Is there a way to understand why the time complexity of the following is O(N), either(or both) from intuition or by proof?
What it does is basically that given the input of an integer array
, we are creating the firstSmallLeft
array of the same length where firstSmallLeft[i]
= the first index where array[index]
< array[i]
when doing the back scan. (i.e., for index i, it scans from i-1, ... , j until it finds the first smaller element such that array[j] < array[i])
For Example, if input = [3,2,5,6,4,1]
.
firstSmallLeft would be [-1,-1,1,2,1,-1]
//input int[] array
int[] firstSmallLeft = new int[array.length];
firstSmallLeft[0] = -1;
for(int i = 1; i < array.length; i ){
int cur = i -1;
while (cur >=0 && array[cur] >= array[i]){
cur = firstSmallLeft[cur];
}
firstSmallLeft[i] = cur;
}
CodePudding user response:
Here follows a complete rewrite of my previous (wrong!) answer.
An important insight is that the last entry in firstSmallLeft
that received a value (at index i-1
) represents the top of a stack, which is implemented as a linked list. An entry in firstSmallLeft
represents an input value (as the index where it occurs is the index in the input array
) and a link in the linked list (as its value is an index in firstSmallLeft
). The bottom element in the stack has a null-link (-1).
In general not all elements in firstSmallLeft
are in the represented stack, since some indices are skipped over. An example:
Let's say that the top of the stack is at index 100 (when i
is 101), and it has as value 40, and firstSmallLeft[40]
is -1, then that means we have a stack with just two elements (array[100]
and array[40]
), and none of the other indices in firstSmallLeft
are included in the current stack. Each one of those once was on the stack, but have since been popped of the stack.
Every time cur = firstSmallLeft[cur];
is executed (in the inner loop), we actually pop an element of the stack (the linked list that starts at cur
is one element shorter).
And when we do firstSmallLeft[i] = cur
we push array[i]
on the stack, making a reference to the part of the stack we wanted to keep.
Now this "virtual" stack is identified, we see the following:
A value is pushed exactly once on the stack, and can only be popped once from the stack, after which it is never visited again.
The overhead of evaluating the while
expression once more to find that we need to exit the loop, occurs as many times as the outer loop iterates (i.e. n-1
times)
Therefore the algorithm is O(n).
CodePudding user response:
You can imagine this algorithm as creating a collection of chains of elements. For each element in the array, following the chain backwards will take you to smaller and smaller values until you walk off the array.
To build the chain for the next element of the array, we start by looking at the chain of the element before us. We’ll then follow that chain for some number of steps, then form the chain for the new element by adding a link back to the place we stopped.
The key intuition here is that the length of the chains drops quickly and grows slowly. More concretely, let C be the length of the chain that starts from the current element of the array. We begin with C = 0. At each step in the algorithm, we drop C by some amount, then increment C by at most one. This means that across all n iterations of the algorithm, C increases at most n times. Therefore, since C is never negative, C can decrease at most n times. And therefore the total work spent in the inner while loop decreasing C, across all iterations of the algorithm, is O(n).
(If you’re familiar with amortized analyses and potential functions, you can also think of this as setting Φ = C.)
This same sort of argument arises in a number of other algorithms, such as building a Cartesian tree with a stack and constructing the failure table in the Knuth-Morris-Pratt string matching algorithm.