Home > Blockchain >  Problem sorting ConcurrentHashMap by values using java.util.Collections.sort() in Java
Problem sorting ConcurrentHashMap by values using java.util.Collections.sort() in Java

Time:09-02

I have this code which prints me a list of words sorted by keys (alphabetically) from counts, my ConcurrentHashMap which stores words as keys and their frequencies as values.

// Method to create a stopword list with the most frequent words from the lemmas key in the json file
   private static List<String> StopWordsFile(ConcurrentHashMap<String, String> lemmas) {

// counts stores each word and its frequency
       ConcurrentHashMap<String, Integer> counts = new ConcurrentHashMap<String, Integer>();

// corpus is an array list for all the individual words
           ArrayList<String> corpus = new ArrayList<String>();

           for (Entry<String, String> entry : lemmas.entrySet()) {
               
               String line = entry.getValue().toLowerCase();               
               line = line.replaceAll("\\p{Punct}", " ");
               line = line.replaceAll("\\d "," ");
               line = line.replaceAll("\\s ", " ");
               line = line.trim();
               String[] value = line.split(" ");

               List<String> words = new ArrayList<String>(Arrays.asList(value));
               corpus.addAll(words);

    }

           // count all the words in the corpus and store the words with each frequency i 
           //counts
           for (String word : corpus) {

               if (counts.keySet().contains(word)) {
                   counts.put(word, counts.get(word)   1);

               } else {counts.put(word, 1);}
}
// Create a list to store all the words with their frequency and sort it by values.
           List<Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());         
           
           List<String> stopwordslist = new ArrayList<>(counts.keySet()); # this works but counts.values() gives an error
           Collections.sort(stopwordslist);
           System.out.println("List after sorting: "  stopwordslist);

So the output is:

List after sorting: [a, abruptly, absent, abstractmap, accept,...]

How can I sort them by values as well? when I use List stopwordslist = new ArrayList<>(counts.values());

I get an error,

- Cannot infer type arguments for ArrayList<>

I guess that is because ArrayList can store < String > but not <String,Integer> and it gets confused.

I have also tried to do it with a custom Comparator like so:

           Comparator<Entry<String, Integer>> valueComparator = new Comparator<Entry<String,Integer>>() {
               @Override
               public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) {
                   String v1 = e1.getValue();
                   String v2 = e2.getValue();
                   return v1.compareTo(v2);
               }
           };  
           
           
           List<Entry<String, Integer>> stopwordslist = new ArrayList<Entry<String, Integer>>();
           // sorting HashMap by values using comparator 
           Collections.sort(counts, valueComparator)

which gives me another error,

The method sort(List<T>, Comparator<? super T>) in the type Collections is not applicable for the arguments (ConcurrentHashMap<String,Integer>, Comparator<Map.Entry<String,Integer>>)

how can I sort my list by values?

my expected output is something like

[the, of, value, v, key, to, given, a, k, map, in, for, this, returns, if, is, super, null, specified, u, function, and, ...]

CodePudding user response:

Let’s go through all the issues of your code

  1. Name conventions. Method names should start with a lowercase letter.

  2. Unnecessary use of ConcurrentHashMap. For a purely local use like within you method, an ordinary HashMap will do. For parameters, just use the Map interface, to allow the caller to use whatever Map implementation will fit.

  3. Unnecessarily iterating over the entrySet(). When you’re only interested in the values, you don’t need to use entrySet() and call getValue() on every entry; you can iterate over values() in the first place. Likewise, you would use keySet() when you’re interested in the keys only. Only iterate over entrySet() when you need key and value (or want to perform updates).

  4. Don’t replace pattern matches by spaces, to split by the spaces afterwards. Specify the (combined) pattern directly to split, i.e. line.split("[\\p{Punct}\\d\\s] ").

  5. Don’t use List<String> words = new ArrayList<String>(Arrays.asList(value)); unless you specifically need the features of an ArrayList. Otherwise, just use List<String> words = Arrays.asList(value);
    But when the only thing you’re doing with the list, is addAll to another collection, you can use Collections.addAll(corpus, value); without the List detour.

  6. Don’t use counts.keySet().contains(word) as you can simply use counts.containsKey(word). But you can simplify the entire

    if (counts.containsKey(word)) {
        counts.put(word, counts.get(word)   1);
    } else {counts.put(word, 1);}
    

    to

    counts.merge(word, 1, Integer::sum);
    
  7. The points above yield

    ArrayList<String> corpus = new ArrayList<>();
    for(String line: lemmas.values()) {
        String[] value = line.toLowerCase().trim().split("[\\p{Punct}\\d\\s] ");
        Collections.addAll(corpus, value);
    }
    for (String word : corpus) {
        counts.merge(word, 1, Integer::sum);
    }
    

    But there is no point in performing two loops, the first only to store everything into a potentially large list, to iterate over it a single time. You can perform the second loop’s operation right in the first (resp. only) loop and get rid of the list.

    for(String line: lemmas.values()) {
        for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s] ")) {
            counts.merge(word, 1, Integer::sum);
        }
    }
    
  8. You already acknowledged that you can’t sort a map, by copying the map into a list and sorting the list in your first variant. In the second variant, you created a List<Entry<String, Integer>> but then, you didn’t use it at all but rather tried to pass the map to sort. (By the way, since Java 8, you can invoke sort directly on a List, no need to call Collections.sort).
    You have to keep copying the map data into a list and sorting the list. For example,

    List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());         
    list.sort(Map.Entry.comparingByValue());
    

    Now, you have to decide whether you change the return type to List<Map.Entry<String, Integer>> or copy the keys of the sorted entries to a new list.

Taking all points together and staying with the original return type, the fixed code looks like

private static List<String> stopWordsFile(Map<String, String> lemmas) {
    Map<String, Integer> counts = new HashMap<>();

    for(String line: lemmas.values()) {
        for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s] ")) {
            counts.merge(word, 1, Integer::sum);
        }
    }

    List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());         
    list.sort(Map.Entry.comparingByValue());

    List<String> stopwordslist = new ArrayList<>();
    for(Map.Entry<String, Integer> e: list) stopwordslist.add(e.getKey());

//    System.out.println("List after sorting: "   stopwordslist);
    return stopwordslist;
}
  • Related