I have a large table couple of millions of pairs of ints [[1,2],[45,101],[22,222] etc..]. What is the quickest way in Python to remove duplicates ?
Creating empty list and appending it "if not in" doesn't work since it takes ages. Converting to Numpy and use "isin" I can't seem to get it to work on pairs.
CodePudding user response:
you can do the following
arr = [[1,2],[45,101],[22,222], [1,2]]
arr = set(tuple(i) for i in arr)
if you want to convert it back to list
arr = [list(i) for i in arr]
CodePudding user response:
You could use np.unique()
:
np.unique([[1,2],[45,101],[22,222],[22,222]], axis=0)
Output:
array([[ 1, 2],
[ 22, 222],
[ 45, 101]])
Note that this re-orders the list
CodePudding user response:
Probably going to be this: list(set(my_list))
Edit: Whoops. In any case, if whatever is iterating over said list can perform the task of detecting duplicates, that’d be the faster than removing duplicates beforehand.