Home > front end >  Quick way to remove duplicate objects in a List in Python
Quick way to remove duplicate objects in a List in Python

Time:11-07

I have a list of MyClass objects which is made like so:

# The class is MyClass(string_a: str = None, string_b: str = None) 
test_list: List[MyClass] = []
test_clist.append(MyClass("hello", "world"))
test_clist.append(MyClass("hello", ""))
test_clist.append(MyClass("hello", "world"))
test_clist.append(MyClass(None, "world")

I want the end result to only have the 3rd append removed:

# Remove test_clist.append(MyClass("hello", "world"))

This is just a sample and the list of objects can have nothing in the list or n. Is there a way to remove them quickly or a better way like how to quickly tell if it already exists before appending?

CodePudding user response:

If your objects are of primitive types, you can use set

list(set(test_clist))

and if not, like your case then you have 2 solutions

1- Implement __hash__() & __eq__()

You have to implement __hash__() & __eq__ in your class in order to use set() to remove the duplicates

see below example

class MyClass(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return f"MyClass({self.x} - {self.y})"

    def __hash__(self):
        return hash((self.x, self.y))

    def __eq__(self, other):
        if self.__class__ != other.__class__:
            return NotImplemented

        return (
            self.x == other.x and
            self.y == other.y
        )

l = []

l.append(MyClass('hello', 'world'))
l.append(MyClass('hello', 'world'))
l.append(MyClass('', 'world'))
l.append(MyClass(None, 'world'))

print(list(set(l)))

Since you have more than one key that you want to use in comparing, __hash__() uses a key tuple.

__repr__() just for representing the class's object as a string.

2- Use 3rd Party Package

check out a package called toolz

then use unique() method to remove the duplicates by passing a key

toolz.unique(test_list, key=lambda x: x.your_attribute)

In your case, you have more than one attribute, so you can combine them into one, for example by creating a concatenated property for them then pass it to toolz.unique() as your key, or just concatenate them on the fly like below

toolz.unique(test_list, key=lambda x: x.first_attribute   x.second_attribute)

CodePudding user response:

You can use set to remove the duplicates from a list, but that won't work for you out of the box as you are using your own class object. For set to work, you need to implement *__eq__* dunder method, which enables you to compare class objects on the value your object stores. A simple example would be :

def __eq__(self, obj, *args, **kwargs):
    return isinstance(obj, MyClass) and obj.first_name == self.first_name and ..

Once you have this in your class, you can use set(test_list) to remove duplicates.

  • Related