I can use the below code to search if the text str
contains any or both of the keys
, i.e.if it contains "MS" or
"dynamics" or
both of them
package main
import (
"fmt"
"regexp"
)
func main() {
keys := []string{"MS", "dynamics"}
keysReg := fmt.Sprintf("(%s %s)|%s|%s", keys[0], keys[1], keys[0], keys[1]) // => "(MS dynamics)|MS|dynamics"
fmt.Println(keysReg)
str := "What is MS dynamics, is it a product from MS?"
re := regexp.MustCompile(`(?i)` keysReg)
matches := re.FindAllString(str, -1)
fmt.Println("We found", len(matches), "matches, that are:", matches)
}
I want the user to enter his phrase, so I trim unwanted words and characters, then doing the search as per above.
Let's say the user input was: This,is,a,delimited,string
and I need to build the keys
variable dynamically to be (delimited string)|delimited|string
so that I can search for my variable str
for all the matches, so I wrote the below:
s := "This,is,a,delimited,string"
t := regexp.MustCompile(`(?i),|\.|this|is|a`) // backticks are used here to contain the expression, (?i) for case insensetive
v := t.Split(s, -1)
fmt.Println(len(v))
fmt.Println(v)
But I got the output as:
8
[ delimited string]
What is the wrong part in my cleaning of the input text, I'm expecting the output to be:
2
[delimited string]
Here is my playground
CodePudding user response:
To quote the famous quip from Jamie Zawinski,
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Two things:
- Instead of trying to weed out garbage from the string ("cleaning" it), extract complete words from it instead.
- Unicode is a compilcated matter; so even after you have succeeded with extracting words, you have to make sure your words are properly "escaped" to not contain any characters which might be interpreted as RE syntax before building a regexp of them.
package main
import (
"errors"
"fmt"
"regexp"
"strings"
)
func build(words ...string) (*regexp.Regexp, error) {
var sb strings.Builder
switch len(words) {
case 0:
return nil, errors.New("empty input")
case 1:
return regexp.Compile(regexp.QuoteMeta(words[0]))
}
quoted := make([]string, len(words))
for i, w := range words {
quoted[i] = regexp.QuoteMeta(w)
}
sb.WriteByte('(')
for i, w := range quoted {
if i > 0 {
sb.WriteByte('\x20')
}
sb.WriteString(w)
}
sb.WriteString(`)|`)
for i, w := range quoted {
if i > 0 {
sb.WriteByte('|')
}
sb.WriteString(w)
}
return regexp.Compile(sb.String())
}
var words = regexp.MustCompile(`\pL `)
func main() {
allWords := words.FindAllString("\tThis\v\x20\x20,\t\tis\t\t,?a!,¿delimited?,string‽", -1)
re, err := build(allWords...)
if err != nil {
panic(err)
}
fmt.Println(re)
}
Further reading: