Home > other >  How to check if a string contains a word and ignore special characters?
How to check if a string contains a word and ignore special characters?

Time:10-09

I need to check if a sentence contains any of the word from a string array but while checking it should ignore special characters like comma. But the result should have original sentence.

For example, I have a sentence "Tesla car price is $ 250,000." In my word array I've wrdList = new string[5]{ "250000", "Apple", "40.00"};

I have written the below line of code, but it is not returning the result because 250,000 and 250000 are not matching.

List<string> res = row.ItemArray.Where(itmArr => wrdList.Any(wrd => itmArr.ToString().ToLower().Contains(wrd.ToString()))).OfType<string>().ToList();

And one important thing is, I need to get original sentence if it matches with string array.

For example, result should be "Tesla car price is $ 250,000." not like "Tesla car price is $ 250000."

CodePudding user response:

The first option to consider for most text matching problems is to use regular expressions. This will work for your problem. The core part of the solution is to construct an appropriate regular expression to match what you need to match.

You have a list of words, but I'll focus on just one word. Your requirements specify that you want to match on a "word". So to start with, you can use the "word boundary" pattern \b. To match the word "250000", the regular expression would be \b250000\b.

Your requirements also specify that the word can "contain" characters that are "special". For it to work correctly, you need to be clear what it means to "contain" and which characters are "special".

For the "contain" requirement, I'll assume you mean that the special character can be between any two characters in the word, but not the first or last character. So for the word "250000", any of the question marks in this string could be a special character: "2?5?0?0?0?0".

For the "special" requirement, there are options that depend on your requirements. If it's simply punctuation, you can use the character class \p{P}. If you need to specify a specific list of special characters, you can use a character group. For example, if your only special character is comma, the character group would be [,].

To put all that together, you would create a function to build the appropriate regular expression for each target word, then use that to check your sentence. Something like this:

public static void Main()
{
    string sentence = "Tesla car price is $ 250,000.";
    var targetWords = new string[]{ "250000", "350000", "400000"};
    Console.WriteLine($"Contains target word? {ContainsTarget(sentence, targetWords)}");
}

private static bool ContainsTarget(string sentence, string[] targetWords)
{
    return targetWords.Any(targetWord => ContainsTarget(sentence, targetWord));
}

private static bool ContainsTarget(string sentence, string targetWord)
{
    string targetWordExpression = TargetWordExpression(targetWord);
    var re = new Regex(targetWordExpression);
    return re.IsMatch(sentence);
}

private static string TargetWordExpression(string targetWord)
{
    var sb = new StringBuilder();
    // If special characters means a specific list, use this:
    string specialCharacterMatch = $"[,]?";
    // If special characters means any punctuation, then you can use this:
    //string specialCharactersMatch = "\\p{P}?";
    
    bool any = false;
    foreach (char c in targetWord)
    {
        if (any)
        {
            sb.Append(specialCharacterMatch);
        }
        any = true;
        sb.Append(c);
    }
    
    return $"\\b{sb}\\b";
}

Working code: https://dotnetfiddle.net/5UJSur

CodePudding user response:

How about Replace(",", "")

itmArr.ToString().ToLower().Replace(",", "").Contains(wrd.ToString())

side note: .ToLower() isn't required since digits are case insensitive and a string doesn't need .ToString()

so the resuld could also be

itmArr.Replace(",", "").Contains(wrd)

https://dotnetfiddle.net/A2zN0d

  • Related