Home > Back-end >  How to find the longest matching substring in Go
How to find the longest matching substring in Go

Time:02-25

What's the correct way to find the longest substring match? I’ve been trying to find the longest substring match with the regexp package in Go: https://pkg.go.dev/regexp

Specifically the longest consecutive match of vowels in a string. Any length or combination of "aeiou".

Here's the code I have now. It works until the strings get too long resulting in a panic (over 1000). I start searching for the longest substring and return once a match is found.

s := "chrononhotonthuooaos"
for i := utf8.RuneCountInString(s); i >=0; i-- {
    exp := "[aeiou]{"
    exp  = fmt.Sprint(i)
    exp  = "}"
    regexStr := regexp.MustCompile(exp)
    longestStr := regexStr.FindStringSubmatch(s)
    if longestStr != nil {
        fmt.Println("longest submatch found:", longestStr)
        return utf8.RuneCountInString(longestStr[0])
    }
}
return 0

CodePudding user response:

The reason you are getting panics is because your are creating a new regex on every iteration — a 10,000 character string? That's 10,000 compiled regexen, and I'm pretty sure those get cached. Not to mention that regex compilation is expensive.

So why use a regex at all? Something like this is almost certainly faster, doesn't create any intermediate objects, and runs in O(n) time and space complexity. It will work for strings of any length whatsoever:

https://goplay.tools/snippet/nvZwWeEF8bC

func longest_vowel_run(s string) (string, int, int) {
    slen := len(s)
    bgn := 0
    end := 0
    max := 0
    x := 0
    y := 0

    // while we still have chars left...
    for x < len(s) {

        // find the first vowel
        for x < slen && !vowel[s[x]] {
            x  
        }
        // find the next non-vowel
        for y = x; y < slen && vowel[s[y]]; {
            y  
        }

        // if the current run is longer than the max run, update the max run
        if z := y - x; z > max {
            bgn = x
            end = y
            max = z
        }

        // pick up where we left off
        x = y

    }

    var maxRun string
    if max > 0 {
        maxRun = s[bgn:end]
    }

    return maxRun, bgn, end - 1
}

var vowel = map[byte]bool{
    'a': true, 'A': true,
    'e': true, 'E': true,
    'i': true, 'I': true,
    'o': true, 'O': true,
    'u': true, 'U': true,
}

CodePudding user response:

You can do the following:

re := regexp.MustCompile(`[aeiou] `)

matches := re.FindAllStringSubmatch(s, -1)
if len(matches) > 0 {
    longest := 0
    for i := range matches {
        if len(matches[i]) >= len(matches[longest]) {
            longest = i
        }
    }
    fmt.Println(matches[longest])
}

https://go.dev/play/p/loxIsR3LMOM


Or, without regexp, you can do:

s := "chrononhotonthuooaos"

var start, end int
for i := 0; i < len(s); i   {
    switch s[i] {
    case 'a', 'e', 'i', 'o', 'u':
        j := i   1
        for ; j < len(s); j   {
            switch s[j] {
            case 'a', 'e', 'i', 'o', 'u':
                continue
            }
            break
        }
        if j-i > end-start {
            start, end = i, j
        }
        i = j
    }
}

fmt.Println(s[start:end])

https://go.dev/play/p/XT5-pr3n0tl

  • Related