Java get duplicated elements more than 2 list-CodePudding

I have 9 list and I want to compare all of them and get duplicated elements.

I tried retainAll() method but it deletes elements that is not duplicated.

For example when I only compare SUPPORT and PROJECT_AND_SUPPORT_NAMES I get duplicated value.

But when I compare like below it returns empty.

        MANAGEMENT_NAMES.retainAll(SUPPORT_NAMES);
        MANAGEMENT_NAMES.retainAll(PROJECT_AND_SUPPORT_NAMES);
        MANAGEMENT_NAMES.retainAll(SALES_NAMES);
        MANAGEMENT_NAMES.retainAll(MARKETING_NAMES);
        MANAGEMENT_NAMES.retainAll(ACADEMY_NAMES);
        MANAGEMENT_NAMES.retainAll(DEVELOPMENT_NAMES);
        MANAGEMENT_NAMES.retainAll(HR_AND_ADMINISTRATION_NAMES);
        MANAGEMENT_NAMES.retainAll(WAREHOUSE_NAMES);

        SUPPORT_NAMES.retainAll(PROJECT_AND_SUPPORT_NAMES);
        SUPPORT_NAMES.retainAll(SALES_NAMES);
        SUPPORT_NAMES.retainAll(MARKETING_NAMES);
        SUPPORT_NAMES.retainAll(ACADEMY_NAMES);
        SUPPORT_NAMES.retainAll(DEVELOPMENT_NAMES);
        SUPPORT_NAMES.retainAll(HR_AND_ADMINISTRATION_NAMES);
        SUPPORT_NAMES.retainAll(WAREHOUSE_NAMES);

        PROJECT_AND_SUPPORT_NAMES.retainAll(SALES_NAMES);
        PROJECT_AND_SUPPORT_NAMES.retainAll(MARKETING_NAMES);
        PROJECT_AND_SUPPORT_NAMES.retainAll(ACADEMY_NAMES);
        PROJECT_AND_SUPPORT_NAMES.retainAll(DEVELOPMENT_NAMES);
        PROJECT_AND_SUPPORT_NAMES.retainAll(HR_AND_ADMINISTRATION_NAMES);
        PROJECT_AND_SUPPORT_NAMES.retainAll(WAREHOUSE_NAMES);
.
.
.

        HR_AND_ADMINISTRATION_NAMES.retainAll(WAREHOUSE_NAMES);

I want to get result like :

Department A - Jerry

Department B - Chris

Department C - Jerry

Department D - Chris

Department E - Chris

Department F - Jerry

CodePudding user response：

If I understand you correctly duplicate is a name that appears in all the departments more than once. And as a result, you want to have duplicated names for each department separately.

I want to get result like :

Department A - Jerry Department B - Chris

So my idea is first to create a Set of all duplicates in all the departments. And then create lists of duplicates for each department separately based on the Set.

This solution has linear time complexity.

Method getDuplicatesForAllDepartments() iterates over all the names in all departments and counts the number of occurrences of each name. It retains only names that appear more than once and saves them to a Set.

Method getDuplicatesForOneDepartments() determines which names in the given department are contained in the Set of duplicates.

    public static void main(String[] args) {
        List<List<String>> departments = List.of(SUPPORT_NAMES, PROJECT_AND_SUPPORT_NAMES, SALES_NAMES, ... etc);
//         * List.of() - works with Java 9 onwards
//        For Java 8 you can add departments one by one using Collection.addAll()
//        List<List<String>> departments = new ArrayList<>();
//        Collections.addAll(departments, SUPPORT_NAMES, PROJECT_AND_SUPPORT_NAMES, SALES_NAMES, ... etc);

        Set<String> allDuplicates = getDuplicatesForAllDepartments(departments);
        List<String> duplicates1 = getDuplicatesForOneDepartment(SUPPORT_NAMES, allDuplicates);
        List<String> duplicatesProjectAndSupport = getDuplicatesForOneDepartment(PROJECT_AND_SUPPORT_NAMES, allDuplicates);
        // ... etc
    }

    public static Set<String> getDuplicatesForAllDepartments(List<List<String>> departments) {
        return departments.stream()
                .flatMap(List::stream)
                .collect(Collectors.groupingBy(UnaryOperator.identity(), 
                                               Collectors.counting()))
                .entrySet().stream()
                .filter(entry -> entry.getValue() > 1)
                .map(Map.Entry::getKey)
                .collect(Collectors.toSet());
    }
    
    public static List<String> getDuplicatesForOneDepartment(List<String> department, Set<String> allDuplicates) {
        return department.stream()
                .filter(allDuplicates::contains)
                .collect(Collectors.toList());
    }

The imperative implementations of methods getDuplicatesForAllDepartments() and getDuplicatesForOneDepartment() will look like this:

    public static Set<String> getDuplicatesForAllDepartments(List<List<String>> departments) {
        Map<String, Integer> nameToCount = new HashMap<>();
        for (List<String> department: departments) {
            for (String name: department) {
                nameToCount.merge(name, 1, Integer::sum);
            }
        }

        Set<String> duplicates = new HashSet<>();
        for (Map.Entry<String, Integer> entry: nameToCount.entrySet()) {
            if (entry.getValue() > 1) {
                duplicates.add(entry.getKey());
            }
        }
        return duplicates;
    }

    public static List<String> getDuplicatesForOneDepartment(List<String> department, Set<String> allDuplicates) {
        List<String> duplicates = new ArrayList<>();
        for (String name: department) {
            if (allDuplicates.contains((name))) {
                duplicates.add(name);
            }
        }
        return duplicates;
    }

I've used your example to prove that both implementations a working.

        List<String> departmentA = List.of("Jerry", "Stephen", "Daniel");
        // for Java 8 Arrays.asList("Jerry", "Stephen", "Daniel");
        List<String> departmentB = List.of("Chris", "Earl", "Ryan");
        List<String> departmentC = List.of("Jerry", "Brown", "Micheal");

        List<List<String>> departments = new ArrayList<>();
        Collections.addAll(departments, departmentA, departmentB, departmentC);
        System.out.println(getDuplicatesForAllDepartments(departments));

output

[Jerry]

CodePudding user response：

You said that this was the meaning of duplicate that you wanted:

"elements that appear more than once in any one of the lists"

Here is a simple solution.

Set<String> duplicates = new HashSet<>();

for (List<String> list: lists) {
    Set<String> all = new HashSet<>();
    for (String s: list) {
        if (!all.add(s)) {
            duplicates.add(s);
        }
    }
}

The logic is that a string will only be added to duplicates if it was already in all. (The add method will return true if the element wasn't in the set already. Check the javadocs.)

Since you want elements that are duplicate within a given list, we need to reset all for each list.

This could be coded in other ways (e.g. using Java 8 streams), but in a situation like this, the most important thing is that you can understand the code.