Home > Software design >  Java count word frequency using stream
Java count word frequency using stream

Time:07-03

Hey I need to count frequency of words and return a string listing them. I have to omit words that have less than 4 characters and words that have count of less than 10. I have to order them from highest to lowest count as well as alphabetically if count is same. Here's the code.

import java.util.*;
import java.util.stream.*;

public class Words {

    public String countWords(List<String> lines) {

    String text = lines.toString();
    String[] words = text.split("(?U)\\W ");

    Map<String, Long> freq = Arrays.stream(words).sorted()
        .collect(Collectors.groupingBy(String::toLowerCase,
            Collectors.counting()));

    LinkedHashMap<String, Long> freqSorted = freq.entrySet().stream()
        .filter(x -> x.getKey().length() > 3)
        .filter(y -> y.getValue() > 9)
        .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
        .collect(Collectors.toMap(Map.Entry::getKey,
            Map.Entry::getValue, (oldValue, newValue) -> oldValue,
            LinkedHashMap::new));

    return freqSorted.keySet().stream()
        .map(key -> key   " - "   freqSorted.get(key))
        .collect(Collectors.joining("\n", "", ""));
    }
}

I can't change the argument of this method. I have trouble sorting it alphabetically after sorting it by value. Tried using thenCompare but couldn't make it work. Aside from that I'd appreciate any feedback on how to reduce number of lines so I don't have to stream 3 times.

CodePudding user response:

Another aproach to do it in one go without intermediate collecting into maps is to wrap your grouping collector in collectingAndThen, where you can format your final result :

public String countWords(List<String> lines) {
    String text = lines.toString();
    String[] words = text.split("(?U)\\W ");

    return Arrays.stream(words)
                .filter(s -> s.length() > 3)
                .collect(Collectors.collectingAndThen(
                         Collectors.groupingBy(String::toLowerCase, Collectors.counting()),
                         map -> map.entrySet()
                                 .stream()
                                 .filter(e -> e.getValue() > 9)
                                 .sorted(Map.Entry.<String, Long>comparingByValue().reversed()
                                         .thenComparing(Map.Entry.comparingByKey()))
                                 .map(e -> String.format("%s - %d", e.getKey(), e.getValue()))
                                 .collect(Collectors.joining(System.lineSeparator()))));
}

CodePudding user response:

Here is one approach. I am using your frequency count map as the source.

  • first define a comparator.
  • then sort putting the existing map into sorted order
  • toMap takes a key, value, merge function, and final map of LinkedhashMap to preserve the order.
Comparator<Entry<String, Long>> comp =
        Entry.comparingByValue(Comparator.reverseOrder());
comp = comp.thenComparing(Entry.comparingByKey());

Map<String, Long> freqSorted = freq.entrySet().stream()
        .filter(x -> x.getKey().length() > 3
                && x.getValue() > 9)
        .sorted(comp)
        .collect(Collectors.toMap(Entry::getKey,
                Entry::getValue, (a, b) -> a,
                LinkedHashMap::new));

Notes:

  • To verify that the sorting is proper you can comment out the filter and use fewer words.
  • you do not need to sort your initial stream of words when preparing the frequency count as they will be sorted in the final map.
  • the merge function is syntactically required but not used since there are no duplicates.
  • I chose not to use TreeMap as once the stream is sorted, there is no need to sort again.

CodePudding user response:

The problem should be your LinkedHasMap because it only keeps insertion order and therefore can't be sorted. You can try using TreeMap since it can be sorted and keeps the order.

And I think you shouldn't focus about getting as less lines as possible instead try to get it as readable as possible for the future. So I think what you have there is fine because you split the streams in logical parts; Counting, Sorting and joining!

To swap to TreeMap just change the variable and collector type Would look like this:

import java.util.*;
import java.util.stream.*;

public class Words {

    public String countWords(List<String> lines) {

    String text = lines.toString();
    String[] words = text.split("(?U)\\W ");

    Map<String, Long> freq = Arrays.stream(words).sorted()
        .collect(Collectors.groupingBy(String::toLowerCase,
            Collectors.counting()));

    TreeMap<String, Long> freqSorted = freq.entrySet().stream()
        .filter(x -> x.getKey().length() > 3)
        .filter(y -> y.getValue() > 9)
        .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
        .collect(Collectors.toMap(Map.Entry::getKey,
            Map.Entry::getValue, (oldValue, newValue) -> oldValue,
            TreeMap::new));

    return freqSorted.keySet().stream()
        .map(key -> key   " - "   freqSorted.get(key))
        .collect(Collectors.joining("\n", "", ""));
    }
}
  • Related