How to remove duplicate lists from a 'list of lists'?-CodePudding

I have a list of lists (I'm relatively new to Python so excuse me if the terms are inaccurate, but look at the example below) and want to remove any duplicate lists.

In this example, entries 1&4 and 3&5 are identical and a duplicate should be removed.

List = [[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], [1, 'A', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]

[[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]

I currently have the following for loop reading through the list and removing duplicates but this makes it very slow and my code is much longer and the input list is much more complicated than in my example and makes the code run for days and days.

unique = []
for i in cohesiveFaceNodes:
    if not i in unique:
        unique.append(i)
cohesiveFaceNodes = unique

CodePudding user response：

de-duping while preserving the order (from Cpython 3.6 ):

>>> lst = [[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], 
...        [1, 'A', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]
>>> [list(x) for x in dict.fromkeys(map(tuple, lst))]
[[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]

CodePudding user response：

If you can convert the inner lists into tuples, there is a super simple one-liner way to handle this

# use a list of tuples instead of a list of lists for this method to work
input_list = [(1, 'A', 6, 2), (8, 'C', 6, 2), (3, 'G', 3, 4), (1, 'A', 6, 2), (3, 'G', 3, 4), (3, 'B', 3, 4)]
deduped_list = list(dict.fromkeys(input_list))  # remove dupes, return new list of tuples

Edit to add that a quick way to convert your existing list of lists to a list of tuples is to use a list comprehension like so input_list = [tuple(e) for e in input_list]

Edit 2: if you for some reason really really need a list of lists after the fact, once again it's list comprehensions to the rescue final_list = [list(e) for e in deduped_list]

CodePudding user response：

Testing whether something is an element of a list (i in unique) is quite expensive (it iterates the list element by element until it finds a match or the list is exhausted). To check for element membership a data structure such as a set is much more efficient. So making unique a set rather than a list would help.

Now there's a small hurdle: Python sets don't support lists as members, because lists are mutable and not hashable. Assuming the elements in each of the inner lists are hashable, though, you can convert them to Python tuples (which are similar to lists but immutable) and then they can be elements of sets.

So one solution could be (I'm reusing the original variable names, though I think they're not ideal and I recommend changing them):

unique = set()
result = []
for i in cohesiveFaceNodes:
    i_as_tuple = tuple(i)
    if not i_as_tuple in unique:
        unique.add(i_as_tuple)
        result.append(i)

CodePudding user response：

For better coding practice and readability, it may be better to use dataclass to store these data. You can explicitly name each entry in the inner list for more clarity. dataclass offers built-in equality comparison like the tuple methods in the other answers.

from dataclasses import dataclass

@dataclass
class AClass:
    some_int: int
    some_chr: str
    int2: int
    int3: int


lst = [[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], 
       [1, 'A', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]

new_lst = [AClass(x) for x in lst]
deduped_list = list(dict.fromkeys(new_lst))

CodePudding user response：

List = [[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], [1, 'A', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]

result = [] 
for i in List: 
    if i not in result: 
        result.append(i) 

print(result)