As a preface, very new to Java and extremely new to HashMaps
.
I have a HashMap
that has strings as keys and associated integer values that represent frequency. I would like to know the most efficient way to find the key associated with the max integer value.
Here is the code I have so far:
public String most(String[] sentences) {
HashMap<String, Integer> wordFreq = new HashMap<String, Integer>();
for(String sentence : sentences) {
String[] words = sentence.split(" ");
for (String word : words) {
wordFreq.merge(word, 1, Integer::sum);
}
}
//TODO find key associated with max integer value in map
}
CodePudding user response:
You can do it like this. Stream entrySet
of the map and then get the maximum entry by using maxBy
on the value of the entry. Since maxBy
returns an Optional
, you can then use map
to get the value. Return an informatory message if no value exists.
String result = wordFreq.entrySet().stream()
.collect(Collectors.maxBy(Entry.comparingByValue()))
.map(Entry::getKey).orElse("No Value Found");
And as was commented on you could also do this. I tend to use the first one out of habit.
String result = wordFreq.entrySet().stream()
.max(Entry.comparingByValue())
.map(Entry::getKey).orElse("No Value Found");
You could also use other ways to optimize this but since you constructed a map I figured that is how you wanted to do it. If you need a map of words, you may want to construct it outside of the method. Then just pass the map to the method instead of an array of sentences. That way you can use the map for other things.
Since you haven't yet had a chance to respond to my question, I will offer a solution regarding ties for the maximum occurrence of a word. Similar as before except first you return the maximum value found. The you can stream the entrySet
, filtering for the entries that have the maximum value. Then just collect into a list.
int max = wordFreq.entrySet().stream()
.max(Entry.comparingByValue()).map(Entry::getValue)
.orElse(0);
List<String> wordFreq.entrySet().stream()
.filter(e -> e.getValue() == max).map(Entry::getKey)
.toList();
}
Lastly, you use stream constructs to create the initial frequency map. I am using \\s
as the regex as there could be more than one space. The splitAsStream
will apply the pattern and the toMap
is similar to a merge
in functionality.
Map<String, Integer> wordFreq = Arrays.stream(sentences)
.flatMap(s -> Pattern.compile("\\s ").splitAsStream(s))
.collect(Collectors.toMap(x -> x, x -> 1,
Integer::sum));
CodePudding user response:
The most frequent word can be found while accumulating the values of the Map. That would allow to avoid the overheads of iterating over the map entries.
For that, we need to introduce a couple of variables an one condition into your existing code.
Method merge()
returns a current value associated with the offered key. Which allows to track the maximum value and update the most frequent word in the input accordingly.
public String most(String[] sentences) {
Map<String, Integer> wordFreq = new HashMap<>();
String most = "no data";
int max = 0;
for(String sentence : sentences) {
String[] words = sentence.split(" ");
for (String word : words) {
int current = wordFreq.merge(word, 1, Integer::sum);
if (current > max) {
most = word;
max = current;
}
}
}
return most;
}
In case when there are multiple words having the same frequency in the input (which happens to be maximal) and you want to get all of them, the return type should be collection of strings, for instance List<String>
. And instead of maintaining the most frequent word as a string, we need to maintain a collection of strings.
public List<String> most(String[] sentences) {
Map<String, Integer> wordFreq = new HashMap<>();
List<String> mostFrequent = new ArrayList<>();
int max = 0;
for(String sentence : sentences) {
String[] words = sentence.split(" ");
for (String word : words) {
int current = wordFreq.merge(word, 1, Integer::sum);
if (current >= max) {
if (current > max) {
mostFrequent = new ArrayList<>();
max = current;
}
mostFrequent.add(word);
}
}
}
return mostFrequent;
}
CodePudding user response:
you also might consider using a bi-directional map implementation like apache commons collections BidiMap or guava's BiMap
But as your Map values are not unique, for high performance you need some sort of index to search for a key. There are no indexes that work in both directions, so actually you need 1 Map (for key to value) and 1 Multimap (for value to keys), and a wrapper that will maintain both in consistent state on any mutation (insert, upeate, delete,...)