I'm currently prepping for a technical interview and practicing some data structures and algorithms questions with Python. There is a common question that asks you to find the longest substring in a string, such that that substring contains no repeated characters. Intuitively, I understand how to use a sliding window to solve this problem, which can be done with something like:
def longest_substring(s: str) -> int:
longest_sub_string = 0
if len(s) == 1:
return 1
for window_size in range(len(s) 1, 0, -1):
for i in range(len(s) - window_size 1):
window = s[i:i window_size]
if not self.contains_repeats(window) and len(window) > longest_sub_string:
longest_sub_string = len(window)
return longest_sub_string
def contains_repeats(s: str = None) -> bool:
splt = list(s)
if len(list(set(splt))) < len(splt):
return True
However, this solution is not efficent for very long input strings taking something like O(n^2) time. I've found an alternative sliding window implementation:
def longest_substring(s: str) -> int:
last_idxs = {}
max_len = 0
start_idx = 0
for i in range(0, len(s)):
if s[i] in last_idxs:
start_idx = max(start_idx, last_idxs[s[i]] 1)
max_len = max(max_len, i-start_idx 1)
last_idxs[s[i]] = i
return max_len
which solves the problem in linear time. I've picked apart what the code is doing and understand the individual parts, but cannot connect it to how a sliding window works, which is preventing me from being able to apply this approach to different problems. I could just memorize the code, but I'd like to understand how what's happeing in the second code block is similar to what's happening in the first. Can anyone explain this in a strighforward way that shows how this second variation implements a sliding window?
CodePudding user response:
I wouldn't call the first code a "sliding-window algorithm". It appears to simply be a bruteforce algorithm that tries all possible substrings. It might take time proportional to n³, because there are n² substrings to check, and applying contains_repeats
might take time proportional to n per substring in the worst-case. Or rather, if k is the number of distinct characters in the alphabet, then contains_repeats
can take time proportional to up to min(n, k); thus the first algorithm can take time up to n² min(n, k)
.
The second algorithm is a sliding-window algorithm. The window goes from index start_idx
to index i
. It takes time proportional to n because each of those two indices can only increase, never decrease, so the total number of iterations is at most 2n. Instead of trying all possible substrings, the algorithms has a varying-size "window" that "slides" from left to right (and never goes back).
Note that "sliding-window algorithm" is a term with at least two distinct meanings. In addition to this kind of string algorithms, it is sometimes used to refer to algorithms that use convolutions.