Find Duplicated Elements in a List of Integer without using distinct() method-CodePudding

Recently, I've been asked this question during an Interview :

How to find the Duplicate Elements in a List of Integer without using java Stream's distinct() method ?

This should be done by Java's Stream API, but should not use distinct() method.

Stub-code:

List<Integer> listOfInt = new ArrayList<>();

CodePudding user response：

You can do this using frequency collectors. Have not looked at optimization , but this will not use distinct

import java.util.Set;
import java.util.List;
import java.util.ArrayList;
import static java.util.stream.Collectors.toSet;
import java.util.Collections;

public class DetectDuplicates{
    public static void main(String args[]) {
      List<Integer> companyIds = new ArrayList<Integer>();
      companyIds.add(1);
      companyIds.add(1);
      companyIds.add(2);
      companyIds.add(3);
      companyIds.add(3);
      Set<Integer> duplicateCompanies = companyIds
                .stream()
                .filter(company -> Collections.frequency(companyIds, company) > 1)
                .collect(toSet());
      System.out.println("Duplicate companies "   duplicateCompanies);
    }
}

This will print

Duplicate companies [1, 3]

CodePudding user response：

Method distinct() will not allow you "to find duplicated elements". You'll even never know if there were any duplicates in the source by using distinct() method alone. As its name suggests, it will give you unique elements and nothing more.

By looking at the currently excepted answer, I can assume that you might have framed the problem in the wrong way and the actual problem statement should be: "how to eliminate duplicates without using distinct()".

How to eliminate duplicates

question is asked in Interview : How to find the Duplicate Elements from a List of Integer without using java Stream's distinct() method ?

To answer this interview question properly, you were expected to demonstrate knowledge on how distinct() works.

According to the documentation, distinct() - is a stateful intermediate operation. Under the hood, it maintains a LinkedHashSet to insure uniqueness of the elements and at the same time preserve the in initial order of the stream (if the stream is ordered).

In case if we don't care about the order, we can store elements into a general purpose implementation of the Set interface - HashSet. That's how the stream-based code would look like:

Set<Integer> uniqueElements = sourceList.stream().collect(Collectors.toSet());

And I guess you also were expected to make a conclusion (since you've attached the tag performance to the question) that there's no need to generate a stream pipeline and then utilize a collector just in order to dump all list elements into a set. Because it is a very convoluted and less efficient way to do the same thing that can be done by using a parameterized constructor of the HashSet class:

Set<Integer> uniqueElements = new HashSet<>(sourceList);

How to Find Duplicated elements

In case if your question was framed correctly, then it's a completely different task:

you need to determine for each element whether it's a duplicate;
and filter out only duplicates.

One of the ways to do that is by generating an intermediate Map<Integer,Boolean> where the key would be the element itself and the value would denote whether a particular key has duplicates in the source list. That can be done by using a built-in collector toMap():

List<Integer> duplicates = sourceList.stream()
    .collect(Collectors.toMap(
        Function.identity(),  // the key - an element itself
        next -> false,        // the value of `false` - the element isn't proved yet to be a duplicate because it has been encountered for the first time
        (left, right) -> true // the value of `true` - the element is a duplicate
    ))
    .entrySet().stream()
    .filter(Map.Entry::getValue)
    .map(Map.Entry::getKey)
    .collect(Collectors.toList());