I would like to get the bare minimum amount of words that would be needed to create a collection of strings. For example:
"hi hello hello", "hello", "bye", "bye bye", "hello hello",
The intended output for this example would be string[] {"bye", "bye", "hello", "hello", "hi"} in no particular order.
I did a string.Join() on the starting array, and I wanted to use a HashSet to get the unique strings, but for some reason I couldn’t import it.
My goal was to get the highest count for each word, and then duplicate each word into the final string array.
CodePudding user response:
- Split each string into words, then group same words together.
// (hi), (hello hello)
// (hello)
// (bye)
// (bye bye)
// (hello hello)
var strs = new string[]{"hi hello hello", "hello","bye", "bye bye", "hello hello"};
var result = strs.Select(str => str.Split(' ').GroupBy(word => word))
- Flatten the groups and count the occurrence
// hi => 1
// hello => 2
// hello => 1
// bye => 1
// bye => 2
// hello => 2
.SelectMany(groups => groups)
.Select(group => KeyValuePair.Create(group.Key, group.Count()))
- Group the results again and find the highest count of each group, use
Enumerable.Repeat
to generate duplicates for each word.
// hi => 1
// hello => 2, hello => 1, hello => 2
// bye => 1, bye => 2
.GroupBy(group => group.Key)
.SelectMany(group => Enumerable.Repeat(group.Key, group.Max(g => g.Value)))
.ToArray();
CodePudding user response:
Here's an example. I first get a list of distinct words in each of the original strings. Then I check each of the original strings for the number of times each of the distinct word appears. The number of times the word occurs is recorded to a dictionary. And finally, we loop through the dictionary and repeat each of the words we found by the number of time we found an occurance of the word.
string[] strList = new string[] { "hi hello hello", "hello", "bye", "bye bye", "hello hello" };
List<string> wordList = new List<string>();
//Break each string into an array of words.
foreach (string s in strList) {
wordList.AddRange(s.Split(' '));
}
//Get a list of distinct words.
List<string> distinctWords = wordList.Distinct().ToList();
//Dictionary to hold the minimal count of words required to comprise the original strings
Dictionary<string, int> wordMinCountDict = new Dictionary<string, int>();
//Loop through each distinct word.
//Find the number of times it occurs in each of the original strings.
//Collect the highest count of the number of times the word occurs in the string.
foreach(string w in distinctWords)
{
foreach(string s in strList)
{
int instanceCount = s.Split(' ').ToList().FindAll(p => p == w).Count();
if (instanceCount > 0)
{
if (wordMinCountDict.TryGetValue(w, out int i))
{
if (instanceCount > i)
{
wordMinCountDict.Remove(w);
wordMinCountDict.Add(w, instanceCount);
}
}
else
{
wordMinCountDict.Add(w, instanceCount);
}
}
}
}
//Create a new list with the number of occurrences of each word.
List<string> finalWords = new List<string>();
foreach(KeyValuePair<string, int> kvp in wordMinCountDict)
{
if (kvp.Value < 1) continue;
finalWords.AddRange(Enumerable.Repeat(kvp.Key, kvp.Value).ToList());
}
//Use .ToArray() to convert finalWords to final array. (not neccessary in the next statement but left in as an example.)
Console.WriteLine(string.Join(" ", finalWords.ToArray()));