Home > front end >  find unique lists inside another list in an efficient way
find unique lists inside another list in an efficient way

Time:05-14

solution = [[1,0,0],[0,1,0], [1,0,0], [1,0,0]]

I have the above nested list, which contain some other lists inside it, how do we need to get the unique lists inside the solution

output = [[1,0,0],[0,1,0]

Note: each list is of same size

Things I have tried :

  1. Take each list and compare with all other lists to see if its duplicated or not ? but it is very slow..

How can I check before inserting inserting list , is there any deuplicate of it so to avoid inserting duplicates

CodePudding user response:

If you don't care about the order, you can use set:

solution = [[1,0,0],[0,1,0],[1,0,0],[1,0,0]]

output = set(map(tuple, solution))
print(output) # {(1, 0, 0), (0, 1, 0)}

CodePudding user response:

Since lists are mutable objects you can't really check identity very quickly. You could convert to tuple, however, and store the tuple-ized view of each list in a set.

Tuples are heterogenous immutable containers, unlike lists which are mutable and idiomatically homogenous.

from typing import List, Any

def de_dupe(lst: List[List[Any]]) -> List[List[Any]]:
    seen = set()
    output = []
    for element in lst:
        tup = tuple(element)
        if tup in seen:
            continue  # we've already added this one
        seen.add(tup)
        output.append(element)
    return output

solution = [[1,0,0],[0,1,0], [1,0,0], [1,0,0]]
assert de_dupe(solution) == [[1, 0, 0], [0, 1, 0]]

CodePudding user response:

Pandas duplicate might be of help.

import pandas as pd
df=pd.DataFrame([[1,0,0],[0,1,0], [1,0,0], [1,0,0]])
d =df[~df.duplicated()].values.tolist()

Output

[[1, 0, 0], [0, 1, 0]]

or, since you tag multidimensional-array, you can use numpy approach.

import numpy as np
def unique_rows(a):
    a = np.ascontiguousarray(a)
    unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
    return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))
arr=np.array([[1,0,0],[0,1,0], [1,0,0], [1,0,0]])
output=unique_rows(arr).tolist()

Based on the suggestion in this OP

CodePudding user response:

try this solution :

x=[[1,0,0],[0,1,0], [1,0,0], [1,0,0]]

Import numpy and convert the nested list into a numpy array

import numpy as np

a1=np.array(x)

find unique across rows

a2 = np.unique(a1,axis=0)

Convert it back to a nested list

a2.tolist()

Hope this helps

CodePudding user response:

while lists are not hashable and therefore inefficient to duplicate, tuples are. So one way would be to transform your list into tuples and duplicate those.

>>> solution_tuples = [(1,0,0), (0,1,0), (1,0,0), (1,0,0)]
>>> set(solution_tuples)
{(1, 0, 0), (0, 1, 0)}
  • Related