Home > database >  Using package`regexp` to find all mactch substring in Golang, but get unexpected result
Using package`regexp` to find all mactch substring in Golang, but get unexpected result

Time:09-20

I am using package regexp to find all mactch substring in Golang, but get unexpected result.Here is my code:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
    regexpStr := "\\bPrefix:([a-zA-Z0-9] [\\w-.] [^.])#[0-9] "
    re := regexp.MustCompile(regexpStr)
    matchs := re.FindAllString(str, -1)
    fmt.Println(matchs)
}

You can see it in https://go.dev/play/p/XFSMW09MKxV.

expected:

[Prefix:middle#6 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

But I got:

[Prefix:middle#6 Prefix:middle#16026 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

Why Prefix:middle#16026 macthed? Could someone tell me the reason? And how to fix it, thx.

Here is the rules for what should match:

I want to extract Prefix:${middle}#${number} in a String.

  • ${middle} rules:

    • Allowed characters: letters, numbers, underscores, underscores, dots
    • Must begin with a letter or number
    • Can't end with a dot
  • ${number} rules:

    • Shoule be number
  • Prefix:${middle}#${number} can appear at the beginning or end of a string, or in the middle of a string, but:

    • Appear at the beginning of the string, it needs to be followed by a space or /n;
    • Appear at the end of the string, it needs to be preceded by a space or /n;
    • in the middle of the string, but it needs to be preceded and followed by a newline symbol or a space.

CodePudding user response:

You can use the following regex with regexp.FindAllStringSubmatch:

(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d )(?:\s|$)

See the regex demo.

Note that this pattern will only work after doubling whitespaces in the string because both the whitespace boundaries, (?:\s|^) and (?:\s|$), are consuming patterns, and will prevent getting consecutive matches. Hence, regexp.MustCompile(\s).ReplaceAllString(str, "$0$0") or similar should be used before running the above regex.

Details:

  • (?:\s|^) - either a whitespace or start of string
  • (Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d ) - Group 1:
    • Prefix: - a fixed string
    • [a-zA-Z0-9] - an alphanumeric
    • [\w.-]* - zero or more letters, digits, underscores, dots or hyphens
    • [^.] - a char other than .
    • # - a # char
    • \d - one or more digits
  • (?:\s|$) - either a whitespace or end of string

See the Go demo:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "Prefix:middle#113 build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
    re := regexp.MustCompile(`(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d )(?:\s|$)`)
    matchs := re.FindAllStringSubmatch(regexp.MustCompile(`\s`).ReplaceAllString(str, "$0$0"), -1)
    for _, m := range matchs {
        fmt.Println(m[1])
    }
}

Output:

Prefix:middle#113
Prefix:middle#6
Prefix:middle#110
Prefix:middl.e#111
Prefix:middle#112
  • Related