What's the correct way to find the longest substring match? I’ve been trying to find the longest substring match with the regexp package in Go: https://pkg.go.dev/regexp
Specifically the longest consecutive match of vowels in a string. Any length or combination of "aeiou".
Here's the code I have now. It works until the strings get too long resulting in a panic (over 1000). I start searching for the longest substring and return once a match is found.
s := "chrononhotonthuooaos"
for i := utf8.RuneCountInString(s); i >=0; i-- {
exp := "[aeiou]{"
exp = fmt.Sprint(i)
exp = "}"
regexStr := regexp.MustCompile(exp)
longestStr := regexStr.FindStringSubmatch(s)
if longestStr != nil {
fmt.Println("longest submatch found:", longestStr)
return utf8.RuneCountInString(longestStr[0])
}
}
return 0
CodePudding user response:
The reason you are getting panics is because your are creating a new regex on every iteration — a 10,000 character string? That's 10,000 compiled regexen, and I'm pretty sure those get cached. Not to mention that regex compilation is expensive.
So why use a regex at all? Something like this is almost certainly faster, doesn't create any intermediate objects, and runs in O(n) time and space complexity. It will work for strings of any length whatsoever:
https://goplay.tools/snippet/nvZwWeEF8bC
func longest_vowel_run(s string) (string, int, int) {
slen := len(s)
bgn := 0
end := 0
max := 0
x := 0
y := 0
// while we still have chars left...
for x < len(s) {
// find the first vowel
for x < slen && !vowel[s[x]] {
x
}
// find the next non-vowel
for y = x; y < slen && vowel[s[y]]; {
y
}
// if the current run is longer than the max run, update the max run
if z := y - x; z > max {
bgn = x
end = y
max = z
}
// pick up where we left off
x = y
}
var maxRun string
if max > 0 {
maxRun = s[bgn:end]
}
return maxRun, bgn, end - 1
}
var vowel = map[byte]bool{
'a': true, 'A': true,
'e': true, 'E': true,
'i': true, 'I': true,
'o': true, 'O': true,
'u': true, 'U': true,
}
CodePudding user response:
You can do the following:
re := regexp.MustCompile(`[aeiou] `)
matches := re.FindAllStringSubmatch(s, -1)
if len(matches) > 0 {
longest := 0
for i := range matches {
if len(matches[i]) >= len(matches[longest]) {
longest = i
}
}
fmt.Println(matches[longest])
}
https://go.dev/play/p/loxIsR3LMOM
Or, without regexp, you can do:
s := "chrononhotonthuooaos"
var start, end int
for i := 0; i < len(s); i {
switch s[i] {
case 'a', 'e', 'i', 'o', 'u':
j := i 1
for ; j < len(s); j {
switch s[j] {
case 'a', 'e', 'i', 'o', 'u':
continue
}
break
}
if j-i > end-start {
start, end = i, j
}
i = j
}
}
fmt.Println(s[start:end])