Home > OS >  How to find duplicates in list of lists?
How to find duplicates in list of lists?

Time:10-22

I have a list of key binds in a class like so:

self.key_bind_list = [["A", "B"], ["C"], ["SHIFT", "NUM6", "NUM7"], ["A", "B"], ["A", "B", "C"]]

*in this case a duplicate should be detected by the sublists ["A", "B"], but not by ["A", "B"] and ["A", "B", "C"]

And I'd like to check if there are any duplicates on the main list (assuming that the keys within each sublist are uniques, order is not important, and I don't need to know which ones that aren't unique)

I've tried using the set method in the following:

if(len(self.key_bind_list) != len(set(self.key_bind_list))):

Which gave me a unhashable type: 'list' error.

CodePudding user response:

Assuming you just want to check if there are duplicates, and print out which sublists contain duplicates, you could use an approach with a list that contains set elements, such as so:

key_bind_list = [["A", "B"], ["C"], ["SHIFT", "NUM6", "NUM7"], ["B", "A"], ["A", "B", "C"]]

seen = []

for i, sublist in enumerate(key_bind_list):
    s = set(sublist)
    if s in seen:
        print(f'index {i}: there are duplicates in {sublist}')
    seen.append(s)

Output:

index 3: there are duplicates in ['B', 'A']

To just return a bool value if any sublist in a list is a duplicate (regardless of order) of another sublist, you could do something like this:

def has_duplicates(L: list[list]) -> bool:
    seen = []

    for sublist in L:
        s = set(sublist)
        if s in seen:
            return True
        seen.append(s)

    return False


print(has_duplicates(key_bind_list))

CodePudding user response:

Using collections.Counter. It fits well to model multisets. To bypass the unhashable type error a cast to tuple is needed but it makes the match sensible to the sublists' ordering!

from collections import Counter

lst = [["A", "B"], ["C"], ["SHIFT", "NUM6", "NUM7"], ["A", "B"]]

c = Counter(map(tuple, lst))

for k, v in c.items():
    if v > 1:
        print(k, v)

By checking all pairs with a set-equality criteria, i.e. all terms are equals upto to a certain ordering:

from itertools import combinations

lst = [["A", "B"], ["C"], ["SHIFT", "NUM6", "NUM7"], ["A", "B"], ["A", "B", "C"]]

for s1, s2 in combinations(map(set, lst), 2):
   if s1 == s2:
      print(s1)
#{'A', 'B'}

CodePudding user response:

len(set([tuple(sorted(x)) for x in L])) != len(L)

Return True means there is duplicates in list of lists.

  • Output:

    >>> L = [["A", "B"], ["C"], ["SHIFT", "NUM6", "NUM7"], ["A", "B"], ["A", "B", "C"]]
    >>> print(len(set([tuple(sorted(x)) for x in L])) != len(L))
    True
    
    >>> L = [['A', 'B'], ['C'], ['SHIFT', 'NUM6', 'NUM7'], ['A', 'B', 'C']]
    >>> print(len(set([tuple(sorted(x)) for x in L])) != len(L))
    False
    
    >>> L = [["A", "B"], ["C"], ["SHIFT", "NUM6", "NUM7"], ["B", "A"], ["A", "B", "C"]]
    >>> print(len(set([tuple(sorted(x)) for x in L])) != len(L))
    True
    
  • Related