I am using package regexp
to find all mactch substring in Golang, but get unexpected result.Here is my code:
package main
import (
"fmt"
"regexp"
)
func main() {
str := "build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
regexpStr := "\\bPrefix:([a-zA-Z0-9] [\\w-.] [^.])#[0-9] "
re := regexp.MustCompile(regexpStr)
matchs := re.FindAllString(str, -1)
fmt.Println(matchs)
}
You can see it in https://go.dev/play/p/XFSMW09MKxV.
expected:
[Prefix:middle#6 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]
But I got:
[Prefix:middle#6 Prefix:middle#16026 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]
Why Prefix:middle#16026
macthed? Could someone tell me the reason? And how to fix it, thx.
Here is the rules for what should match:
I want to extract Prefix:${middle}#${number}
in a String
.
${middle}
rules:- Allowed characters: letters, numbers, underscores, underscores, dots
- Must begin with a letter or number
- Can't end with a dot
${number}
rules:- Shoule be number
Prefix:${middle}#${number}
can appear at the beginning or end of a string, or in the middle of a string, but:- Appear at the beginning of the string, it needs to be followed by a space or
/n
; - Appear at the end of the string, it needs to be preceded by a space or
/n
; - in the middle of the string, but it needs to be preceded and followed by a newline symbol or a space.
- Appear at the beginning of the string, it needs to be followed by a space or
CodePudding user response:
You can use the following regex with regexp.FindAllStringSubmatch
:
(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d )(?:\s|$)
See the regex demo.
Note that this pattern will only work after doubling whitespaces in the string because both the whitespace boundaries, (?:\s|^)
and (?:\s|$)
, are consuming patterns, and will prevent getting consecutive matches. Hence, regexp.MustCompile(
\s).ReplaceAllString(str, "$0$0")
or similar should be used before running the above regex.
Details:
(?:\s|^)
- either a whitespace or start of string(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d )
- Group 1:Prefix:
- a fixed string[a-zA-Z0-9]
- an alphanumeric[\w.-]*
- zero or more letters, digits, underscores, dots or hyphens[^.]
- a char other than.
#
- a#
char\d
- one or more digits
(?:\s|$)
- either a whitespace or end of string
See the Go demo:
package main
import (
"fmt"
"regexp"
)
func main() {
str := "Prefix:middle#113 build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
re := regexp.MustCompile(`(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d )(?:\s|$)`)
matchs := re.FindAllStringSubmatch(regexp.MustCompile(`\s`).ReplaceAllString(str, "$0$0"), -1)
for _, m := range matchs {
fmt.Println(m[1])
}
}
Output:
Prefix:middle#113
Prefix:middle#6
Prefix:middle#110
Prefix:middl.e#111
Prefix:middle#112