How to filter the Maximum number of occurrences of objects with the same id using Java 8 Streams-CodePudding

I need to find out the newsId which has received maximum comments.

I have created the News class in which I created method variables, constructor, getters and setters.

I've created Main class to write the logic in java 8 using streams.

I'm struck at implementing the Predicate interface in filtering out the newsId which has maximum number of count in the list of News objects.

public class News {
    int newsId;
    String postByUser;
    String commentByUser;
    String comment;

    public News(int newsId, String postByUser, String commentByUser, String comment) {
        this.newsId = newsId;
        this.postByUser = postByUser;
        this.commentByUser = commentByUser;
        this.comment = comment;
    }

    public int getNewsId() {
        return newsId;
    }

    public void setNewsId(int newsId) {
        this.newsId = newsId;
    }

    public String getPostByUser() {
        return postByUser;
    }

    public void setPostByUser(String postByUser) {
        this.postByUser = postByUser;
    }

    public String getCommentByUser() {
        return commentByUser;
    }

    public void setCommentByUser(String commentByUser) {
        this.commentByUser = commentByUser;
    }

    public String getComment() {
        return comment;
    }

    public void setComment(String comment) {
        this.comment = comment;
    }
}

class Main {

    static List < News > news = Arrays.asList(
        new News(1, "fb_Userpost", "fb_Usercomment", "comment1"),
        new News(2, "insta_userpost", "insta_usercomment", "comment2"),
        new News(1, "whatsapp_userpost", "whatsapp_usercomment", "comment3"),
        new News(1, "whatsapp_userpost", "whatsapp_usercomment", "comment3"),
        new News(3, "whatsapp_userpost", "whatsapp_usercomment", "comment3")
    );
    public static void main(String args[]) {
        //   Predicate<News> pr = s -> s
        news.stream()
            .filter(pr)
            .collect(Collectors.toList())
            .forEach(s - > System.out.println(s.getNewsId()));
    }

CodePudding user response：

Result - a single newsId

I need to find out the newsId which has received maximum comments.

You can't achieve it by using a filter() alone. And it's not necessary to utilize filter() operation for that at all.

To find the most frequent news, you need to accumulate the data. Method collect() should be responsible for that, and not the filter().

The most obvious option is to create an intermediate Map which will contain a count for each newsId. To do that, you can use a combination of collectors groupingBy() and counting().

Then you can create a stream over the map entries and pick the entry with the highest value using max() as a terminal operation.

public static void main(String args[]) {
    
    news.stream()
        .collect(Collectors.groupingBy( // creating an intermediate Map<Integer, Long>
            News::getNewsId,            // map's key
            Collectors.counting()       // value
        ))
        .entrySet().stream()               // creating a stream over the map's entries
        .max(Map.Entry.comparingByValue()) // picking the entry with the highest value -> result: Optional<Map.Entry<Integer, Long>>
        .map(Map.Entry::getKey)            // transforming the optional result Optional<Integer> 
        .ifPresent(System.out::println);   // printing the result if optional is not empty
}

With your sample data, this code will produce the output 1.

Result - a List of newsId having the highest frequency

In order to address the case when there could be several newsId having the same number of occurrences, you can build *a custom collector.

The initial idea is the same a described above, but instead max() operation, this time we would apply collect() on the stream of map entries and a custom collector will be provided as an argument.

To create a custom collector, we can utilize the static method Collector.of().

The logic behind the custom collector provided below is following:

supplier - intermediate results (map entries) are being stored in a Queue.
accumulator - if the next stream element has the same frequency count (the value of a map entry) as the first element in the queue, or the queue is empty, then it gets added into the queue. If the count of the next element is lower, it would be ignored. And if the count is higher, the queue gets cleaned, and the next element will be added.
combiner The two queues obtained while executing the stream in parallel would be combined using almost the same logic as described above for accumulator.
finisher - this function transforms the queue of map entries into a list of newsId.

Note that such an implementation allows to make only a single iteration over the entry set, and this performance advantage is a justification for its complexity.

public static void main(String args[]) {
    
    news.stream()
        .collect(Collectors.groupingBy(
            News::getNewsId,
            Collectors.counting()
        ))
        .entrySet().stream()
        .collect(Collector.of(
            ArrayDeque::new,
            (Queue<Map.Entry<Integer, Long>> queue, Map.Entry<Integer, Long> entry) -> {
                if (queue.isEmpty() || queue.element().getValue().equals(entry.getValue())) {
                    queue.add(entry);
                } else if (queue.element().getValue() < entry.getValue()) {
                    queue.clear();
                    queue.add(entry);
                }
            },
            (left, right) -> {
                if (left.isEmpty() || !right.isEmpty()
                    && right.element().getValue() > left.element().getValue())
                    return right;
                if (right.isEmpty() || left.element().getValue() > right.element().getValue())
                    return left;
                
                left.addAll(right);
                return left;
            },
            queue -> queue.stream().map(Map.Entry::getKey).collect(Collectors.toList())
        ))
        .forEach(System.out::println);
}

static List<News> news = Arrays.asList( // News `1` & `2` are the most frequent
    new News(1, "fb_Userpost", "fb_Usercomment", "comment1"),
    new News(2, "insta_userpost", "insta_usercomment", "comment2"),
    new News(2, "insta_userpost", "insta_usercomment", "comment2"),
    new News(2, "insta_userpost", "insta_usercomment", "comment2"),
    new News(1, "whatsapp_userpost", "whatsapp_usercomment", "comment3"),
    new News(1, "whatsapp_userpost", "whatsapp_usercomment", "comment3"),
    new News(3, "whatsapp_userpost", "whatsapp_usercomment", "comment3")
);

Output:

1
2

CodePudding user response：

First, count the number of times each newsId is referenced. Then, find the maximum count. Finally, keep only those identifiers with the maximum count.

Map<Integer, Long> countByNewsId = news.stream()
    .collect(Collectors.groupingBy(News::getNewsId, Collectors.counting()));
Long max = countByNewsId.values().stream().max(Long::compareTo).orElse(null);
countByNewsId.values().removeIf(Predicate.isEqual(max).negate());
Set<Integer> maxCommentedNewsIds = countByNewsId.keySet();