I have this code which prints me a list of words sorted by keys (alphabetically) from counts, my ConcurrentHashMap which stores words as keys and their frequencies as values.
// Method to create a stopword list with the most frequent words from the lemmas key in the json file
private static List<String> StopWordsFile(ConcurrentHashMap<String, String> lemmas) {
// counts stores each word and its frequency
ConcurrentHashMap<String, Integer> counts = new ConcurrentHashMap<String, Integer>();
// corpus is an array list for all the individual words
ArrayList<String> corpus = new ArrayList<String>();
for (Entry<String, String> entry : lemmas.entrySet()) {
String line = entry.getValue().toLowerCase();
line = line.replaceAll("\\p{Punct}", " ");
line = line.replaceAll("\\d "," ");
line = line.replaceAll("\\s ", " ");
line = line.trim();
String[] value = line.split(" ");
List<String> words = new ArrayList<String>(Arrays.asList(value));
corpus.addAll(words);
}
// count all the words in the corpus and store the words with each frequency i
//counts
for (String word : corpus) {
if (counts.keySet().contains(word)) {
counts.put(word, counts.get(word) 1);
} else {counts.put(word, 1);}
}
// Create a list to store all the words with their frequency and sort it by values.
List<Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());
List<String> stopwordslist = new ArrayList<>(counts.keySet()); # this works but counts.values() gives an error
Collections.sort(stopwordslist);
System.out.println("List after sorting: " stopwordslist);
So the output is:
List after sorting: [a, abruptly, absent, abstractmap, accept,...]
How can I sort them by values as well? when I use List stopwordslist = new ArrayList<>(counts.values());
I get an error,
- Cannot infer type arguments for ArrayList<>
I guess that is because ArrayList can store < String > but not <String,Integer> and it gets confused.
I have also tried to do it with a custom Comparator like so:
Comparator<Entry<String, Integer>> valueComparator = new Comparator<Entry<String,Integer>>() {
@Override
public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) {
String v1 = e1.getValue();
String v2 = e2.getValue();
return v1.compareTo(v2);
}
};
List<Entry<String, Integer>> stopwordslist = new ArrayList<Entry<String, Integer>>();
// sorting HashMap by values using comparator
Collections.sort(counts, valueComparator)
which gives me another error,
The method sort(List<T>, Comparator<? super T>) in the type Collections is not applicable for the arguments (ConcurrentHashMap<String,Integer>, Comparator<Map.Entry<String,Integer>>)
how can I sort my list by values?
my expected output is something like
[the, of, value, v, key, to, given, a, k, map, in, for, this, returns, if, is, super, null, specified, u, function, and, ...]
CodePudding user response:
Let’s go through all the issues of your code
Name conventions. Method names should start with a lowercase letter.
Unnecessary use of
ConcurrentHashMap
. For a purely local use like within you method, an ordinaryHashMap
will do. For parameters, just use theMap
interface, to allow the caller to use whateverMap
implementation will fit.Unnecessarily iterating over the
entrySet()
. When you’re only interested in the values, you don’t need to useentrySet()
and callgetValue()
on every entry; you can iterate overvalues()
in the first place. Likewise, you would usekeySet()
when you’re interested in the keys only. Only iterate overentrySet()
when you need key and value (or want to perform updates).Don’t replace pattern matches by spaces, to split by the spaces afterwards. Specify the (combined) pattern directly to
split
, i.e.line.split("[\\p{Punct}\\d\\s] ")
.Don’t use
List<String> words = new ArrayList<String>(Arrays.asList(value));
unless you specifically need the features of anArrayList
. Otherwise, just useList<String> words = Arrays.asList(value);
But when the only thing you’re doing with the list, isaddAll
to another collection, you can useCollections.addAll(corpus, value);
without theList
detour.Don’t use
counts.keySet().contains(word)
as you can simply usecounts.containsKey(word)
. But you can simplify the entireif (counts.containsKey(word)) { counts.put(word, counts.get(word) 1); } else {counts.put(word, 1);}
to
counts.merge(word, 1, Integer::sum);
The points above yield
ArrayList<String> corpus = new ArrayList<>(); for(String line: lemmas.values()) { String[] value = line.toLowerCase().trim().split("[\\p{Punct}\\d\\s] "); Collections.addAll(corpus, value); } for (String word : corpus) { counts.merge(word, 1, Integer::sum); }
But there is no point in performing two loops, the first only to store everything into a potentially large list, to iterate over it a single time. You can perform the second loop’s operation right in the first (resp. only) loop and get rid of the list.
for(String line: lemmas.values()) { for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s] ")) { counts.merge(word, 1, Integer::sum); } }
You already acknowledged that you can’t sort a map, by copying the map into a list and sorting the list in your first variant. In the second variant, you created a
List<Entry<String, Integer>>
but then, you didn’t use it at all but rather tried to pass the map tosort
. (By the way, since Java 8, you can invokesort
directly on aList
, no need to callCollections.sort
).
You have to keep copying the map data into a list and sorting the list. For example,List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet()); list.sort(Map.Entry.comparingByValue());
Now, you have to decide whether you change the return type to
List<Map.Entry<String, Integer>>
or copy the keys of the sorted entries to a new list.
Taking all points together and staying with the original return type, the fixed code looks like
private static List<String> stopWordsFile(Map<String, String> lemmas) {
Map<String, Integer> counts = new HashMap<>();
for(String line: lemmas.values()) {
for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s] ")) {
counts.merge(word, 1, Integer::sum);
}
}
List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());
list.sort(Map.Entry.comparingByValue());
List<String> stopwordslist = new ArrayList<>();
for(Map.Entry<String, Integer> e: list) stopwordslist.add(e.getKey());
// System.out.println("List after sorting: " stopwordslist);
return stopwordslist;
}