Home > Enterprise >  How to check whether the given word exists in a sentence(string) without using the contains function
How to check whether the given word exists in a sentence(string) without using the contains function

Time:03-08

I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).

As an example I need to check the word "can" is inside the sentence "I can run fast" . If I use strings strings.Contains("can", "I can run fast") this gives true . But strings.Contains("can", "I cannot run fast") also gives true as it contains can . How can I check exactly the word can gives true and cannot gives false in the above mentioned scenario ?

CodePudding user response:

Just as a first attempt, you can try using a regular expression:

import "regexp"

var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)

func containsCan(s string) bool {
    return containsCanRegex.MatchString(s)
}

Note that this matches title-case, so it matches "Can I go?".

The \b in a regular expression matches a "word boundary". It just means there is a word character on one side, and a non-word character, beginning of text, or end of text on the other side.

Note that this will match "can't" because \b treats ' as a word boundary (since it's a non-word character). It sounds like this is not what you want. In order to come up with a more general solution, you may want to know just how general you want the solution to be. A very basic approach would be to split the words first, and then check if any of those words match "can". You could split the words with a regular expression or by using a text segmentation library.

I don't know how to write a regular expression that would accept "can" but reject "can't" in a sentence--the "regexp" package does not support negative lookahead.

CodePudding user response:

I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).

I'm trying to implement a filter for given set of words.

Here's a Proof-Of-Concept (POC) solution, which uses simple algorithms for a word. The algorithms can distinguish between "can", "cannot", and "can't".

package main

import (
    "fmt"
    "strings"
    "unicode"
)

func newFilter(words []string) map[string]struct{} {
    filter := make(map[string]struct{}, len(words))
    for _, word := range words {
        if len(word) > 0 {
            filter[strings.ToLower(word)] = struct{}{}
        }
    }
    return filter
}

func applyFilter(text string, filter map[string]struct{}) bool {
    const rApostrophe = '\u0027'
    const sApostrophe = string(rApostrophe)
    const rSoftHyphen = '\u00AD'
    const sSoftHyphen = string(rSoftHyphen)
    split := func(r rune) bool {
        return !unicode.IsLetter(r) && r != rSoftHyphen && r != rApostrophe
    }

    words := strings.FieldsFunc(text, split)
    for _, word := range words {
        if strings.Contains(word, sSoftHyphen) {
            word = strings.ReplaceAll(word, sSoftHyphen, "")
        }
        if strings.HasSuffix(word, sApostrophe) {
            word = word[:len(word)-1]
        }
        if strings.HasSuffix(word, sApostrophe "s") {
            word = word[:len(word)-2]
        }
        word = strings.ToLower(word)
        if _, ok := filter[word]; ok {
            return true
        }
    }
    return false
}

func main() {
    filter := newFilter([]string{"can"})
    text := "I can run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I cannot run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I can't run fast"
    fmt.Println(applyFilter(text, filter))

    filter = newFilter([]string{"cannot", "can't"})
    text = "I can run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I cannot run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I can't run fast"
    fmt.Println(applyFilter(text, filter))
}

https://go.dev/play/p/hGkStvA7ZDi

  • Related