I have an array of word(s), it can contain one word or more. In case of one word, it's easy to remove it, but when choose to remove multiple words if they are all in the stop words list is difficult for me to figure it out. I prefer solving it with LINQ.
Imagin, I have this array of strings
then use
then he
the image
and the
should be in
should be written
I want to get only
then use
the image
should be written
So, the lines that all it words are in the stop words should be removed, while keep the lines that has mixed words.
My stop words array
string[] stopWords = {"a", "an", "x", "y", "z", "this", "the", "me", "you", "our", "we", "I", "them", "then", "ours", "more", "will", "he", "she", "should", "be", "at", "on", "in", "has", "have", "and"};
Thank you,
CodePudding user response:
One way to solve this problem would be to do the following:
string[] stopWords = { "a", "an", "x", "y", "z", "this", "the", "me", "you", "our", "we", "I", "them", "ours", "more", "will", "he", "she", "should", "be", "at", "on", "in", "has", "have", "and" };
string input = """"
then use
then he
the image
and the
should be in
should be written
"""";
var array = input.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
var filteredArray = array.Where(x => x.Split(' ').Any(y => !stopWords.Contains(y))).ToList();
var result = string.Join(Environment.NewLine, filteredArray);
Console.WriteLine(result);
First 2 lines are just to setup the data.
The third line converts the string into a array of lines by splitting on newline character. (Environment.NewLine
ensures that the code works properly on linux as well.)
Fourth line processes each line by splitting the line on space (which gets us individual words) and then checks if there's any word that doesn't exist in stopWords
list. If any of the words doesn't exist then the Where
condition is satisfied and the whole line is returned in filteredArray
.
Fifth line simply concatenates all individual lines to form the final result
string.
The result should look something like below:
then use
then he
the image
should be written
Note that in your stopWords
list, you have the word them
but not then
. So the second result line should not be removed.
CodePudding user response:
use Intersect method as follows:
foreach (string word in WordsList)
{
List<string> splitData = word.Split(new string[] { " "}, StringSplitOptions.RemoveEmptyEntries).ToList();
bool allOfWordsIsInStopWords = splitData.Intersect(stopWords).Count() == splitData.Count();
}
CodePudding user response:
Acording to this initial problem description:
I have an array of word(s), it can contain one word or more. In case of one word, it's easy to remove it, but when choose to remove multiple words if they are ALL in the stop words list is difficult for me to figure it out. I prefer solving it with LINQ.
The following code resolves the sentences in bold.
using System.Text.RegularExpressions;
string[] stopWords = { "a", "an", "x", "y", "z", "this", "the", "me", "you", "our", "we", "I", "them", "ours", "more", "will", "he", "she", "should", "be", "at", "on", "in", "has", "have", "and" };
string[] inputStrings = { "then use", "then he", "the image", "and the", "should be in", "should be written" };
var wordSeparatorPattern = new Regex(@"\s ");
var outputStrings = inputStrings.Where((words) =>
{
return wordSeparatorPattern.Split(words).Any((word) =>
{
return !stopWords.Contains(word);
});
});
foreach (var item in outputStrings)
{
Console.WriteLine(item);
}