I want to remove duplicate items from lists in sublists on Python.
Exemple :
- myList = [[1,2,3], [4,5,6,3], [7,8,9], [0,2,4]]
to
- myList = [[1,2,3], [4,5,6], [7,8,9], [0]]
I tried with this code :
myList = [[1,2,3],[4,5,6,3],[7,8,9], [0,2,4]]
nbr = []
for x in myList:
for i in x:
if i not in nbr:
nbr.append(i)
else:
x.remove(i)
But some duplicate items are not deleted.
Like this : [[1, 2, 3], [4, 5, 6], [7, 8, 9], [0, 4]]
I still have the number 4 that repeats.
CodePudding user response:
You can make this much faster by:
- Using a
set
for repeated membership testing instead of a list, and - Rebuilding each sublist rather than repeatedly calling
list.remove()
(a linear-time operation, each time) in a loop.
seen = set()
for i, sublist in enumerate(myList):
new_list = []
for x in sublist:
if x not in seen:
seen.add(x)
new_list.append(x)
myList[i] = new_list
>>> print(myList)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [0]]
If you want mild speed gains and moderate readability loss, you can also write this as:
seen = set()
for i, sublist in enumerate(myList):
myList[i] = [x for x in sublist if not (x in seen or seen.add(x))]
CodePudding user response:
You iterate over a list that you are also modifying:
...
for i in x:
...
x.remove(i)
That means that it may skip an element on next iteration.
The solution is to create a shallow copy of the list and iterate over that while modifying the original list:
...
for i in x.copy():
...
x.remove(i)
CodePudding user response:
Why you got wrong answer: In your code, after scanning the first 3 sublists, nbr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
. Now x = [0, 2, 4]
. Duplicate is detected when i = x[1]
, so x = [0, 4]
. Now i
move to x[2]
which stops the for loop.
Optimization has been proposed in other answers. Generally, 'list' is only good for retrieving element and appending/removing at the rear.