Home > other >  Find duplicates in two lists List<int[]> using Streams
Find duplicates in two lists List<int[]> using Streams

Time:05-05

Find duplicates in two lists List<int[]> with streams

I have a list of int arrays. I need to find duplicates of these arrays using 2 lists of int arrays.

I did try to implement it, but I'm getting an empty array.

keys = [[1,1,0], [0,0,1], [1,2,1], [1,3,1], [1,3,2]];
phaseKey = [[1,3,2], [1,2,1], [0,0,2], [1,2,3], [1,0,3]];
desired result: [[[1,3,2]], [1,2,1]];

My code:

Stream.concat(Stream.of(keys.stream().flatMapToInt(Arrays::stream)),                     
        Stream.of(phaseKey.stream().flatMapToInt(Arrays::stream)))
                            .collect(Collectors.groupingBy(
                                    Function.identity(),
                                    Collectors.counting()))
                            .entrySet()
                            .stream()
                            .filter(m -> m.getValue() > 1)
                            .map(Map.Entry::getKey)
                            .toArray();

CodePudding user response:

Given your two lists:

List<int[]> keys = // ....
List<int[]> phaseKey = //...

You just need to filter to find common arrays in both lists:

List<int[]> duplicates = keys.stream()
        .filter(k -> phaseKey.stream().anyMatch(p -> Arrays.equals(p,k)))
        .collect(Collectors.toList());

CodePudding user response:

As @Pshemo has already pointed out in the comment, arrays have no proper implementation of the equals() and hashCode() methods. Which are crucial if for every object that is going to be utilized as a key in a map.

Arrays in Java are very special. They are not being created by instantiating a class, but they are objects, and they derive all methods from the java.lang.Object class. Hence, equals() and hashCode() that are accessible on with arrays could be used to establish uniqueness, i.e. whether the two arrays are represented by the same object in memory.

In order to determine if the contents of the two array is identical you need to use static methods equals() and hashCode() provided by the Arrays utility class. To be able to make use this functionality you can create a class that wraps over an array implement equals/hashCode contract base on the array contents.

I've chosen to implement this wrapper as a Java 16 record to make it leaner, but you can change it to be a regular class.

public record ArrayWrapper(int[] arr) {
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        ArrayWrapper other = (ArrayWrapper) o;
        return Arrays.equals(arr, other.arr);
    }
    
    @Override
    public int hashCode() {
        return Arrays.hashCode(arr);
    }
}

Now any array wrapped by this record is good to go to be used with hash-based data structures.

Because your example is ambiguous, below I've listed two different implementation.

Selecting arrays that are present in both lists

In order to make the process of checking whether a particular array contained in the phaseKey list is also present in the keys list, we create a HashSet of ArrayWrapper objects and then perform checks against this set. That will allow addressing this task in a linear time, performing only a single pass though each of the lists.

public static void main(String[] args) {
    List<int[]> keys = List.of(new int[]{1, 1, 0}, new int[]{0, 0, 1}, new int[]{1, 2, 1},
                               new int[]{1, 3, 1}, new int[]{1, 3, 2});
    List<int[]> phaseKey = List.of(new int[]{1, 3, 2}, new int[]{1, 2, 1}, new int[]{0, 0, 2},
                               new int[]{1, 2, 3}, new int[]{1, 0, 3});

    Set<ArrayWrapper> wrappedKeys = keys.stream().map(ArrayWrapper::new).collect(Collectors.toSet());
    
    List<int[]> result = phaseKey.stream()
        .map(ArrayWrapper::new)
        .filter(wrappedKeys::contains)
        .distinct()                    // to ensure that each array will be present in the array only once
        .map(ArrayWrapper::arr)
        .collect(Collectors.toList()); // toList() with Java 16 
    
    result.forEach(arr -> System.out.println(Arrays.toString(arr)));
}

Fetching arrays having a number of occurrences more than one

To find out if a particular array appears more than once in one of the given lists (or in both) we can create an intermediate map Map<ArrayWrapper,Boolean> by assigning the initial value for any key as false (not a duplicate), and returning true from the mergeFunction that is meant to resolve duplicates.

public static void main(String[] args) {
    List<int[]> keys = List.of(new int[]{1, 1, 0}, new int[]{0, 0, 1}, new int[]{1, 2, 1},
        new int[]{1, 3, 1}, new int[]{1, 3, 2});
    List<int[]> phaseKey = List.of(new int[]{1, 3, 2}, new int[]{1, 2, 1}, new int[]{0, 0, 2},
        new int[]{1, 2, 3}, new int[]{1, 0, 3});
    
    List<int[]> result = Stream.of(keys, phaseKey)
        .flatMap(List::stream)
        .map(ArrayWrapper::new)
        .collect(Collectors.toMap(    // creates an intermediate map Map<ArrayWrapper, Boolean>
            Function.identity(),
            next -> false,            // first occurrence
            (left, right) -> true))   // all subsequent occurrences
        .entrySet().stream()
        .filter(Map.Entry::getValue)
        .map(Map.Entry::getKey)
        .map(ArrayWrapper::arr)
        .collect(Collectors.toList()); // toList() with Java 16 
    
    result.forEach(arr -> System.out.println(Arrays.toString(arr)));
}

Output

[1, 3, 2]
[1, 2, 1]
  • Related