I am trying to get countryCode of the 3 countries that has max cities by usimng Java Stream on the following entity:
City:
id | name | countryCode |
------------------------------
1 | Berlin | DE |
2 | Munich | DE |
3 | Köln | DE |
4 | Paris | FR |
5 | Kopenhag | DK |
...
I tried something, but not working as expected. So, what is the most proper way to get top 3 country code (top 3 mostly repeated country code)?
final Map<String, City> airportCodes2 = airportService.getCities().stream()
.map(City::getCountryCode, Function.identity())
.toMap();
CodePudding user response:
Sorting
The simplest approach would be to group the data by countryCode
by generating an auxiliary map of type Map<String, Long>
, associating each countryCode
with the count of cities.
Then generate a stream over the map entries and sort it in the reverse order by Value (i.e. in Descending order of the city count).
That how it might be implemented:
public static List<String> getTopNCodes(List<City> cities,
int limit) {
return cities.stream()
.collect(Collectors.groupingBy( // creates a Map<String, Long>
City::getCountryCode,
Collectors.counting()
))
.entrySet().stream()
.sorted(Map.Entry.<String, Long>comparingByValue().reversed())
.limit(limit) // <- retain top N
.map(Map.Entry::getKey)
.toList();
}
Time complexity of this approach is O(n * log n) doe to sorting, which would be downside would be if number of elements to retrieve it small (like 3
), meanwhile the data is massive.
We can do better.
Partial sorting using Custom Collector
Instead of sorting all entries of the auxiliary map, we can perform partial sorting using PriorityQueue
(this class is an implementation of the Binary Heap offered by JDK).
For that, we can implement a custom collector using static factory method Collector.of()
.
An instance of PriorityQueue
would be used as a mutable container of the Collector. Each map entry from the stream would be compared with the root element of the Heap. If the element is greater (entry holds a larger count) the next entry gets added to the queue root. And in case if the limit is exceeded, the root element should be removed.
To make the code reusable we can gentrify it. The first part (creating an intermediate map with Values representing frequencies of Keys) remains the same.
public static <T, K> List<K> getTopN(List<T> list,
Function<T, K> keyExtractor,
int limit) {
return list.stream()
.collect(Collectors.groupingBy(
keyExtractor,
Collectors.counting()
))
.entrySet().stream()
.collect(getMaxN(
limit,
Map.Entry.<K, Long>comparingByValue().reversed(),
Map.Entry::getKey
));
}
Min Heap-based Collector:
public static <T, R> Collector<T, ?, List<R>> getMaxN(int size,
Comparator<T> comparator,
Function<T, R> keyExtractor) {
return Collector.of(
() -> new PriorityQueue<>(comparator),
(Queue<T> queue, T next) -> tryAdd(queue, next, comparator, size),
(Queue<T> left, Queue<T> right) -> {
right.forEach(next -> tryAdd(left, next, comparator, size));
return left;
},
(Queue<T> queue) -> queue.stream().map(keyExtractor).toList(),
Collector.Characteristics.UNORDERED
);
}
public static <T> void tryAdd(Queue<T> queue, T next, Comparator<T> comparator, int size) {
if (queue.size() == size && comparator.compare(queue.element(), next) < 0)
queue.remove(); // if next value is greater than the smallest element in the queue and max size has been exceeded the smallest element needs to be removed from the queue
if (queue.size() < size) queue.add(next);
}