Home > Software design >  Remove duplicate items from lists in Python lists
Remove duplicate items from lists in Python lists

Time:03-21

I want to remove duplicate items from lists in sublists on Python.

Exemple :

  • myList = [[1,2,3], [4,5,6,3], [7,8,9], [0,2,4]]

to

  • myList = [[1,2,3], [4,5,6], [7,8,9], [0]]

I tried with this code :

myList = [[1,2,3],[4,5,6,3],[7,8,9], [0,2,4]]
 
nbr = []

for x in myList:
    for i in x:     
        if i not in nbr:
            nbr.append(i)
        else:
            x.remove(i)
    

But some duplicate items are not deleted.

Like this : [[1, 2, 3], [4, 5, 6], [7, 8, 9], [0, 4]]

I still have the number 4 that repeats.

CodePudding user response:

You can make this much faster by:

  1. Using a set for repeated membership testing instead of a list, and
  2. Rebuilding each sublist rather than repeatedly calling list.remove() (a linear-time operation, each time) in a loop.
seen = set()

for i, sublist in enumerate(myList):
    new_list = []

    for x in sublist:
        if x not in seen:
            seen.add(x)
            new_list.append(x)

    myList[i] = new_list
>>> print(myList)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [0]]

If you want mild speed gains and moderate readability loss, you can also write this as:

seen = set()

for i, sublist in enumerate(myList):
    myList[i] = [x for x in sublist if not (x in seen or seen.add(x))]

CodePudding user response:

You iterate over a list that you are also modifying:

...
    for i in x:
        ...
        x.remove(i)

That means that it may skip an element on next iteration.

The solution is to create a shallow copy of the list and iterate over that while modifying the original list:

...
    for i in x.copy():
        ...
        x.remove(i)

CodePudding user response:

Why you got wrong answer: In your code, after scanning the first 3 sublists, nbr = [1, 2, 3, 4, 5, 6, 7, 8, 9]. Now x = [0, 2, 4]. Duplicate is detected when i = x[1], so x = [0, 4]. Now i move to x[2] which stops the for loop.

Optimization has been proposed in other answers. Generally, 'list' is only good for retrieving element and appending/removing at the rear.

  • Related