So I'm following a algorithms course and it asked to implement the following function:
Write a function called findLongestSubstring, which accepts a string and returns the length of the longest substring with all distinct characters.
findLongestSubstring('') // 0
findLongestSubstring('bbbbbb') // 1
findLongestSubstring('longestsubstring') // 8
And said it has to have a time complexity of o(n). So I could not come up with a o(n) solution, but a supposed (n^2)... my naive solution:
function findLongestSubstring(str: string): number {
const array = str.split("");
let maxLength = 0;
for (let i = 0; i < array.length; i ) {
const letterArray = [array[i]];
for (let j = i 1; j < array.length; j ) {
if (letterArray.find(e => e === array[j])) {
if (letterArray.length > maxLength) maxLength = letterArray.length;
break;
} else {
letterArray.push(array[j])
if (j === array.length - 1) {
if (letterArray.length > maxLength) maxLength = letterArray.length;
break;
}
}
}
}
return maxLength;
}
Then I went to check his solution:
function findLongestSubstring(str: string) {
let longest = 0;
let seen = {};
let start = 0;
for (let i = 0; i < str.length; i ) {
let char = str[i];
if (seen[char]) {
start = Math.max(start, seen[char]);
}
// index - beginning of substring 1 (to include current in count)
longest = Math.max(longest, i - start 1);
// store the index of the next char so as to not double count
seen[char] = i 1;
}
return longest;
}
So I do agree that my solution which has a nested for loop appears to be O(n2) and his solution o(n). But when I tested both algorithms with a really big string to my surprise my algorithm was acctually faster and i have NO idea why... can someone enlight me? The way Im testing:
test("really big string to test bigO", () => {
const alphabet = "abcdefghijklmnopqrstuvwxyz"
const randomLettersArray = Array.from({ length: 100000000 }, () => alphabet[Math.floor(Math.random() * alphabet.length)]);
const randomBigString = randomLettersArray.join("")
const start = Date.now();
findLongestSubstring(randomBigString);
const end = Date.now();
console.log(`Execution time: ${end - start} ms`);
})
CodePudding user response:
Your nested loop solution looks like it could take O(n2) time, but it's actually limited by the longest possible valid substring, and that is limited by the size of the alphabet.
Your test uses a small alphabet -- only 26 characters. Your inner loop has a maximum of 26 iterations in this case, and that just might be faster than the other algorithm in some environments.
If you were to use a much larger alphabet -- say 10000 characters or so, then your algorithm would be much slower, but the other one would not slow down much at all.
CodePudding user response:
Let us condider all longest subtrings (meeting the distinctness condition) that end at the successive positions:
l -> l
lo -> lo
lon -> lon
long -> long
longe -> longe
longes -> longes
longest -> longest
longests -> ts
longestsu -> tsu
longestsub -> tsub
longestsubs -> ubs
longestsubst -> ubst
longestsubstr -> usbtr
longestsubstri -> ubstri
longestsubstrin -> ubstrin
longestsubstring -> ubstring
You will notice that the index of the first character never descreases. Every time one appends a new character, the starting index either does not change, or it moves past the position of the same character in the substring.
So the following loop could do:
the substring is empty
for all ending positions in the string:
append the new character to the substring
move the starting position past the same character in the substring, if any
But we are not done yet: if the new character is found in the substring, the starting cursor will stop there; but if it is not found, we need to traverse the whole substring for nothing, and this can make the complexity quadratic.
So we need an extra device to tell us if the substring contains the new character without traversing the substring. As the alphabet has a finite size, we can do with an array of booleans which will act as counters. Initially all booleans are zero; when a character enters/exits the substring, its entry in the array is set/reset.
Hence the whole search can be performed in linear time. Of course the solution is the length of the longest string found during this process.
all booleans are initially reset
the substring is empty
for all ending positions in the string:
append the new character to the substring
if the boolean for this character is set:
move the starting position past the same character in the substring,
while resetting the booleans of all characters met
set the boolean for this character
update the longest length so far