Home > Software engineering >  Time complexity of while inside for loop?
Time complexity of while inside for loop?

Time:12-29

In a Java app, I have the following algorithm that is used for "Longest Substring with K Distinct Characters" as shown below:

Input: String="araaci", K=2 Output: 4 Explanation: The longest substring with no more than '2' distinct characters is "araa".

Input: String="cbbebi", K=3 Output: 5 Explanation: The longest substrings with no more than '3' distinct characters are "cbbeb" & "bbebi".

Here is the code:

public static int longestSubstring(String str, int k) {
    Map<Character, Integer> map = new HashMap<>();
    int maxLength = 0;
    int l = 0;

    for (int r = 0; r < str.length(); r  ) {
        char cRight = str.charAt(r);
        map.put(cRight, map.getOrDefault(cRight, 0)   1);

        while (map.size() > k) {
            char cLeft = str.charAt(l);
            map.put(cLeft, map.getOrDefault(cLeft, 0) - 1);
            if (map.get(cLeft) == 0) {
                map.remove(cLeft);
            }
            l  ;
        }
        maxLength = Math.max(maxLength, r - l   1);
    }
    return maxLength;
}

I could not understand the time complexity in the following definition:

Time Complexity The time complexity of the above algorithm will be O(N) where ‘N’ is the number of characters in the input string. The outer for loop runs for all characters and the inner while loop processes each character only once, therefore the time complexity of the algorithm will be O(N N) which is asymptotically equivalent to O(N).

So, I thought when there is a while loop inside another for loop, I thought time complexity is O(n^2). But here I could not understand "inner while loop processes each character only once". Can you explain this state if it is correct?

CodePudding user response:

In order to analyse the complexity of an algorithm, most of the time you'll need to understand what the code does in details (you don't need to understand if what it does is correct tho). Using the structure of the code (ie. whether loops are nested) or just looking at the big picture is usually a bad idea. In other words, computing the complexity of an algorithm takes a lot of (your) time.

As you stated "the inner while loop processes each character only once", which is indeed important to notice, but that's not sufficient in my opinion.

Loops do not matter per se, what matters is the total number of instructions your program will run depending on the input size. You can read "instruction" as "a function that runs in constant time" (independently of the input size).

Making sure all function calls are in O(1)

Let's first look at the complexity of all function calls:

  • We have several a map reads, all in O(1) (read this as "constant time read"):
    • map.getOrDefault(cRight, 0)
    • map.getOrDefault(cLeft, 0)
  • Also several map insertions, also all in O(1):
    • map.put(cRight, ...)
    • map.put(cLeft, ...)
  • And map item deletion map.remove(cLeft), also in O(1)
  • The Math.max(..., ...) is also in O(1)
  • str.charAt(..) is also in O(1)
  • There is also increments/decrements of loop variables and check their values and also a few other 1, -1 that are all in O(1).

Ok, now we can safely say that all external functions are "instructions" (or more accurately, all these function use a "constant size number of instructions"). Notice that all hashmap complexities are not exactly accurate but this is a detail you can look at separately

Which means we now only need to call how many of these functions are called.

Analyzing the number of function calls

The argument made in the comments is accurate but using the fact that char cLeft = str.charAt(l) will crash the program if l > N is not very satisfactory in my opinion. But this is a valid point, its impossible for the inner loop to be executed more than l times in total (which leads directly to the expected O(N) time complexity).

If this was given as an exercise, I doubt that was the expected answer. Let's analyze the program like it was written using char cLeft = str.charAt(l % str.length()) instead to make it a little more interesting.

I feel like the main argument should be based on the "total number of character count" stored in map (a map of character to counter pairs). Here are some facts, mainly:

  1. The outter loop always increases a single counter by exactly one.
  2. The inner loop always decreases a single counter by exactly one.

Also:

  1. The inner loop ensure that all counters are positive (removes counters when they are equal to 0).
  2. The inner loop runs as long as the number of (positive) counters is > k.

Lemma For the inner loop to be executed C times in total, the outer loop needs to be executed at least C times in total.

Proof Suppose the outer loop is executed C times, and the inner loop at least C 1 times, that means it exists an iteration r of the outer loop in which the inner loop is executed r 1 times. During this iteration r of the outer loop, at some point (by 1. and 2.) the sum of all character counters in map will equal 0. By fact 3., this means that there are no counter left (map.size() equals 0). Since k > 0, during iteration r it is impossible to enter the inner loop for a r 1 time because of 4.. A contradiction that proves the Lemma is true.

Less formally, the inner loop will never execute if the sum of counters is 0 because the total sum of k (> 0) positives counters is greater than 0. In other words, the consumer (inner loop) can't consume more than what is being produced (by the outer loop).

Because of this Lemma and because the outer loop executes exactly N times, the inner loop executes at most N times. In total we will execute at most A * N function calls in the outter loop, and B * N function calls in the inner loop, both A and B are constants and all functions are in O(1), therefore (A B) * N ∈ O(N).

Also note that writing O(N N) is a pleonasm (or doesn't make sense) because big-O is supposed to ignore all constant factors (both multiplicative and additive). Usually people will not write equations using big-O notations because it is hard to write something correct and formal (appart for obvious set inclusions like O(log N) ∈ O(N)). Usually you would say something like "all operations are in O(N), therefore the algorithm is in O(N)".

  • Related