If I have a list of string like
var MyList = new List<string>
{
"substring1", "substring2", "substring3", "substring4", "substring5"
};
is there any efficient way to determine which elements of that list are contained in the following string
"substring1 is the substring2 document that was processed electronically"
In this case the result should be
var MySubList = new List<string>
{
"substring1", "substring2"
};
CodePudding user response:
We can use LINQ Where
to query, for every substring, whether the large string Contains
the substring:
var MyList = new List<string>
{
"substring1", "substring2", "substring3", "substring4", "substring5"
};
var Text = "substring1 is the substring2 document that was processed electronically";
var output = MyList.Where(x => Text.Contains(x)).ToList();
CodePudding user response:
- Split the
Text
by whitespaces - Sort the words alphabetically
- Create a unique list from that
var words = Text.Split(" ").OrderBy(word => word).Distinct().ToList();
- Create an accumulator collection for the matches
- Create two index variables (one for the
words
, one for thepatterns
)
List<string> matches = new();
int patternIdx = 0, wordIdx = 0;
- Iterate through the lists until you reach one of the collections' end
while(patternIdx < patterns.Count && wordIdx < words.Count)
{
}
- Perform a string comparison
- Advance index variable(s) based on the comparison result
int comparison = string.Compare(patterns[patternIdx],words[wordIdx]);
switch(comparison)
{
case > 0: wordIdx ; break;
case < 0: patternIdx ; break;
default:
{
matches.Add(patterns[patternIdx]);
wordIdx ;
patternIdx ;
break;
}
}
Here I've used C# 9 new feature switch pattern matching.
If you can't use C# 9 then a if ... else if .. else
block would be fine as well.
For the sake of completeness here is the whole code
var Text = "substring1 is the substring2 document that was processed electronically";
var words = Text.Split(" ").OrderBy(x => x).Distinct().ToList();
var patterns = new List<string> { "substring1", "substring2", "substring3", "substring4", "substring5" };
List<string> matches = new();
int patternIdx = 0, wordIdx = 0;
while(patternIdx < patterns.Count && wordIdx < words.Count)
{
int comparison = string.Compare(patterns[patternIdx], words[wordIdx]);
switch(comparison)
{
case > 0: wordIdx ; break;
case < 0: patternIdx ; break;
default:
{
matches.Add(patterns[patternIdx]);
wordIdx ;
patternIdx ;
break;
}
}
}