Home > database >  Simple way to remove empty sets from dict
Simple way to remove empty sets from dict

Time:03-18

There's a common problem where I need to keep track of a bunch of collections in a dictionary. Let's say I want to keep track of which items I borrowed from my friends. The defaultdict class is quite useful to do this:

from collections import defaultdict

d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')

# defaultdict(<class 'set'>, {'Peter': {'salt'}, 'Eric': {'jacket', 'car'}})

This allows me to add items to the respective sets without worrying if any key is already in the set. Now if I return the salt to Peter. This means I owe him nothing and he can be removed from the dictionary. Doing this is slightly more cumbersome.

d['Peter'].remove('salt')
if not d['Peter']:
    del(d['Peter'])

I know I could put this in some function, but for readability I would like a class that removes the key automatically if the corresponding set is empty. Is there some way to do this?

Edit

Okay I realize a pretty major problem with this idea when trying to solve it using inheritance and changing the index function. This is that that when calling d[index] the value is obviously returned already before calling .remove(something), which makes it impossible for the dictionary to know that it has been emptied. I'm guessing there's not really a way around using something different.

CodePudding user response:

The problem with using a defaultdict to do what you want is that even accessing a key sets that key using the factory function. Consider:

from collections import defaultdict

d = defaultdict(set)

if d["Peter"]:
    print("I owe something to Peter")

print(d)
# defaultdict(set, {'Peter': set()})

Also, the problem with creating a sub-class, as you've realized, the __getitem__() method is called before the set is ever emptied, so you'd have to call another function that checks if the set is empty and remove it.

A better idea might be to just not include keys with empty sets when you're creating the string representation.

class NewDefaultDict(defaultdict):
    def __repr__(self):
        return (f"NewDefaultDict({repr(self.default_factory)}, {{"  
        ", ".join(f"{repr(k)}: {repr(v)}" for k, v in self.items() if v)   
        "})")
 
nd = NewDefaultDict(set)
nd["Peter"].add("salt")
nd["Paul"].add("pepper")
nd["Paul"].remove("pepper")

print(nd)
# NewDefaultDict(<class 'set'>, {'Peter': {'salt'}})

You would also need to redefine __contains__() to check if the value is empty, so that e.g. "Paul" in nd returns False:

    def __contains__(self, key):
        return defaultdict.__contains__(self, key) and self[key]

To make it compatible with for ... in nd constructs and dict-unpacking, you can redefine __iter__():

    def __iter__(self):
        for key in defaultdict.__iter__(self):
            if self[key]: yield key

Then,

for k in nd:
    print(k)

gives:

Peter

CodePudding user response:

A dictionary comprehension might be useful.

from collections import defaultdict

d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')

d['Peter'].remove('salt')

d2 = {k: v for k, v in d.items() if len(v) > 0}

The d2 dictionary is now:

{'Eric': {'car', 'jacket'}}

Alternatively, using the fact that an empty set is considered false in Python.

d2 = {k: v for k, v in d.items() if v}

Defining a class to implement this logic, similar to the other answer, we can simply ignore keys/values where the value meets a criteria. A function is passed using the ignore parameter to define that criteria.

from collections import defaultdict

class default_ignore_dict(defaultdict):

    def __init__(self, factory, ignore, *args, **kwargs):
        defaultdict.__init__(self, factory, *args, **kwargs)
        self.ignore = ignore

    def __contains__(self, key):
        return defaultdict.__contains__(self, key) and not self.ignore(self[key])

    def items(self):
        return ((k, v) for k, v in defaultdict.items(self) if not self.ignore(v))

    def keys(self):
        return (k for k, _ in self.items())

    def values(self):
        return (v for _, v in self.items())

Testing this:

>>> d = default_ignore_dict(set, lambda s: not s)
>>> d['Peter'].add('salt')
>>> d['Peter'].remove('salt')
>>> d['Eric'].add('car')
>>> d['Eric'].add('jacket')
>>> 
>>> 'Peter' in d
False
>>> list(d.items())
[('Eric', {'car', 'jacket'})]
>>>
  • Related