Home > Net >  Find minimum amount of words needed to make strings
Find minimum amount of words needed to make strings

Time:11-23

I would like to get the bare minimum amount of words that would be needed to create a collection of strings. For example:

"hi hello hello", "hello", "bye", "bye bye", "hello hello",

The intended output for this example would be string[] {"bye", "bye", "hello", "hello", "hi"} in no particular order.

I did a string.Join() on the starting array, and I wanted to use a HashSet to get the unique strings, but for some reason I couldn’t import it.

My goal was to get the highest count for each word, and then duplicate each word into the final string array.

CodePudding user response:

  1. Split each string into words, then group same words together.
// (hi), (hello hello)
// (hello)
// (bye)
// (bye bye)
// (hello hello)
var strs = new string[]{"hi hello hello", "hello","bye", "bye bye", "hello hello"};
var result = strs.Select(str => str.Split(' ').GroupBy(word => word))
  1. Flatten the groups and count the occurrence
// hi => 1
// hello => 2
// hello => 1
// bye => 1
// bye => 2
// hello => 2
.SelectMany(groups => groups)
.Select(group => KeyValuePair.Create(group.Key, group.Count()))
  1. Group the results again and find the highest count of each group, use Enumerable.Repeat to generate duplicates for each word.
// hi => 1
// hello => 2, hello => 1, hello => 2
// bye => 1, bye => 2
.GroupBy(group => group.Key)
.SelectMany(group => Enumerable.Repeat(group.Key, group.Max(g => g.Value)))
.ToArray();

CodePudding user response:

Here's an example. I first get a list of distinct words in each of the original strings. Then I check each of the original strings for the number of times each of the distinct word appears. The number of times the word occurs is recorded to a dictionary. And finally, we loop through the dictionary and repeat each of the words we found by the number of time we found an occurance of the word.

string[] strList = new string[] { "hi hello hello", "hello", "bye", "bye bye", "hello hello" };

List<string> wordList = new List<string>();

//Break each string into an array of words.
foreach (string s in strList) {
    wordList.AddRange(s.Split(' '));
}

//Get a list of distinct words.
List<string> distinctWords = wordList.Distinct().ToList();

//Dictionary to hold the minimal count of words required to comprise the original strings
Dictionary<string, int> wordMinCountDict = new Dictionary<string, int>();

//Loop through each distinct word.
//Find the number of times it occurs in each of the original strings.
//Collect the highest count of the number of times the word occurs in the string.
foreach(string w in distinctWords)
{
    foreach(string s in strList)
    {
        int instanceCount = s.Split(' ').ToList().FindAll(p => p == w).Count();
        if (instanceCount > 0)
        {
            if (wordMinCountDict.TryGetValue(w, out int i))
            {
                if (instanceCount > i)
                {
                    wordMinCountDict.Remove(w);
                    wordMinCountDict.Add(w, instanceCount);
                }
            }
            else
            {
                wordMinCountDict.Add(w, instanceCount);
            }
        }
    }
}

//Create a new list with the number of occurrences of each word.
List<string> finalWords = new List<string>();
foreach(KeyValuePair<string, int> kvp in wordMinCountDict)
{
    if (kvp.Value < 1) continue;
    finalWords.AddRange(Enumerable.Repeat(kvp.Key, kvp.Value).ToList());
}

//Use .ToArray() to convert finalWords to final array. (not neccessary in the next statement but left in as an example.)
Console.WriteLine(string.Join(" ", finalWords.ToArray()));
  •  Tags:  
  • c#
  • Related