I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).
As an example I need to check the word "can" is inside the sentence "I can run fast" . If I use strings strings.Contains("can", "I can run fast")
this gives true . But strings.Contains("can", "I cannot run fast")
also gives true as it contains can . How can I check exactly the word can gives true and cannot gives false in the above mentioned scenario ?
CodePudding user response:
Just as a first attempt, you can try using a regular expression:
import "regexp"
var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)
func containsCan(s string) bool {
return containsCanRegex.MatchString(s)
}
Note that this matches title-case, so it matches "Can I go?"
.
The \b
in a regular expression matches a "word boundary". It just means there is a word character on one side, and a non-word character, beginning of text, or end of text on the other side.
Note that this will match "can't"
because \b
treats '
as a word boundary (since it's a non-word character). It sounds like this is not what you want. In order to come up with a more general solution, you may want to know just how general you want the solution to be. A very basic approach would be to split the words first, and then check if any of those words match "can"
. You could split the words with a regular expression or by using a text segmentation library.
I don't know how to write a regular expression that would accept "can"
but reject "can't"
in a sentence--the "regexp"
package does not support negative lookahead.
CodePudding user response:
I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).
I'm trying to implement a filter for given set of words.
Here's a Proof-Of-Concept (POC) solution, which uses simple algorithms for a word. The algorithms can distinguish between "can", "cannot", and "can't".
package main
import (
"fmt"
"strings"
"unicode"
)
func newFilter(words []string) map[string]struct{} {
filter := make(map[string]struct{}, len(words))
for _, word := range words {
if len(word) > 0 {
filter[strings.ToLower(word)] = struct{}{}
}
}
return filter
}
func applyFilter(text string, filter map[string]struct{}) bool {
const rApostrophe = '\u0027'
const sApostrophe = string(rApostrophe)
const rSoftHyphen = '\u00AD'
const sSoftHyphen = string(rSoftHyphen)
split := func(r rune) bool {
return !unicode.IsLetter(r) && r != rSoftHyphen && r != rApostrophe
}
words := strings.FieldsFunc(text, split)
for _, word := range words {
if strings.Contains(word, sSoftHyphen) {
word = strings.ReplaceAll(word, sSoftHyphen, "")
}
if strings.HasSuffix(word, sApostrophe) {
word = word[:len(word)-1]
}
if strings.HasSuffix(word, sApostrophe "s") {
word = word[:len(word)-2]
}
word = strings.ToLower(word)
if _, ok := filter[word]; ok {
return true
}
}
return false
}
func main() {
filter := newFilter([]string{"can"})
text := "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
filter = newFilter([]string{"cannot", "can't"})
text = "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
}