Home > Back-end >  Does Python's standard library have a data structure that's a list that only allows unique
Does Python's standard library have a data structure that's a list that only allows unique

Time:10-22

For convenience, I want a list that only allows unique items to be added.

class UniqueList(list):
    def append(self, new_item):
        for existing_item in self:
            if new_item is existing_item:
                return
        super().append(new_item)

This code seems to work perfectly, despite being somewhat inefficient - each time you add an item, it needs to do a search through all of its items to see if it's already in there. It's more efficient, but less convenient, to just filter out duplicates once, at the end, when you're doing adding all your items, but in this case, I prefer convenience over efficiency (or I wouldn't be using Python).

Does this already exist in the standard library somewhere?

A set doesn't allow values that can't be hashed, which I frequently deal with, or I'd use a set.

It's also possible, but not always convenient, to remove duplicates after adding all your items to the list (for example with list comprehension). I find the above code to be more convenient, and safe, than remembering to remove duplicate items at the end.

To be clear, all I want to know is if I should keep writing the code above in my projects when I need it, or if this already exists in the standard library, and I just need to import it.

My code doesn't take into consideration things like modifying one element to be equal to another without using the append method, which can result in duplicates in the UniqueList instance, which I'm hoping the Python standard library's version of my UniqueList would prevent (if there is one).

CodePudding user response:

I don't know of anything in the standard python library or any other python libraries that do this.

As pointed out from some comments you can run into some problems if you accidentally mix a list and a UniqueList.

Probably the best/most efficient solution for this use case is to write your own class that consists of a set (for hashable items) and a list (for unhashable items). This could be a hassle as you would need to implement loads of methods such as __iter__, append (or add) and so on.

If that seems to much work, just use your UniqueList and be careful when using it.

Also, both of the solutions I mentioned still have a problem. Imagine that your data type has two lists inside: [2] and [2, 3]. If you access the first list and append a 3 to it you would end up with [2, 3] and [2, 3]. I think this could be mitigated if you implement your own class by overriding __setattr__, but I don't think it is that trivial to do so.

  • Related