Home > other >  Count Amount of duplicate Sublists in list with Python
Count Amount of duplicate Sublists in list with Python

Time:12-07

I've been researching for quite some time now but can't seem to find how to do this properly. I have a List that consists of a sum of 113287 sub-lists, that each hold 2 integers with 2-3 digits each.

list = [[123, 456], [111, 111], [222, 222], [333, 333], [123, 456], [222, 222], [123, 456]]

Now I want to count the amount of sub-lists, that exist more than once. Not the amount of duplicates overall, also index is irrelevant, I just want to know which combination of values exists more than once.

The result for the example should be "2", since only the sub-lists "[222, 222]" and "[123, 456]" exist more than once.

If possible and only if it doesn't overcomplicate things, I would like to do it without external libraries.

I just can't seem to figure it out, any help is appreciated.

CodePudding user response:

Use collections.Counter to count the elements, then loop over the result to keep only those that have a count greater than 1, and sum:

my_list = [[123, 456], [111, 111], [222, 222], [333, 333],
           [123, 456], [222, 222], [123, 456]]

from collections import Counter

c = Counter(map(tuple, my_list))
number = sum(v>1 for v in c.values())

output: 2

NB. you need to convert the sublists to tuples for them to be hashable and counted by Counter

CodePudding user response:

You can iterate over the set of your list. But because lists are unhashable, you'll need to convert each list in lst to a tuple. Then simply count the number of times each list in lst appears in lst:

lst = [[123, 456], [111, 111], [222, 222], [333, 333], [123, 456], [222, 222], [123, 456]]
out = sum(1 for l in set(map(tuple,lst)) if lst.count(list(l))>1)

Output:

2

Also if you want to count [[12,34],[34,12]] as 2, then building off of @mozway's answer, you can do:

for i, l in enumerate(my_list):
    if l[::-1] in my_list[:i]:
        my_list[i] = l[::-1]

c = Counter(map(tuple, my_list))
number = sum(v>1 for v in c.values())

CodePudding user response:

You can make a non-duplicated version of your list, then count the number of duplicated elements:

ls = [[123, 456], [111, 111], [222, 222], [333, 333], [123, 456], [222, 222], [123, 456]]
uni = []
c = 0
for l in ls:
    if not l in uni:
        uni.append(l)

for l in uni:
    if ls.count(l) != 1:
        c =1

print(c)

Output: 2

CodePudding user response:

MY_LIST = [[123, 456], [111, 111], [222, 222], [333, 333], [123, 456], [222, 222], [123, 456]]

try this:

from: Removing duplicates from a list of lists

uniques = []
for elem in MY_LIST:
    if elem not in uniques:
        uniques.append(elem)
print(uniques,'\n') 

   

and then:

repeated = {}
for elem in uniques:
    counter = 0
    for _elem in MY_LIST:
        if elem == _elem:
            counter=counter 1
    if counter > 1:
        repeated[str(elem)] = counter

print('amount of repeated sublists: {}'.format(len(repeated)))
print(repeated)

output:

[[123, 456], [111, 111], [222, 222], [333, 333]] 

amount of repeated sublists: 2
{'[123, 456]': 3, '[222, 222]': 2}
  • Related