Home > other >  filter dictionary elements of a list with same value in the combinations of two keys
filter dictionary elements of a list with same value in the combinations of two keys

Time:06-03

I would like to filter a list of dictionaries for duplicate elements. A duplicate element is based on the combination of two specific key values (weather_1 and weather_2) in the dict if they are the same: i.e.

[{'weather_1': 'cold', 'weather_2': 'hot', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'cold', 'name': 'james'},
{'weather_1': 'really cold', 'weather_2': 'cold', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'}]

->

[{'weather_1': 'cold', 'weather_2': 'hot', 'name': 'james'},
 {'weather_1': 'really cold', 'weather_2': 'cold', 'name': 'james'},
 {'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'}]

CodePudding user response:

Extract weather_1 and weather_2 from each element, sort those two values, and use that as the key to store the elements as values in a dict.

arr = [{'weather_1': 'cold', 'weather_2': 'hot', 'name': 'james'},
       {'weather_1': 'hot', 'weather_2': 'cold', 'name': 'james'},
       {'weather_1': 'really cold', 'weather_2': 'cold', 'name': 'james'},
       {'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'},
       {'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'}]

d = {}
for x in arr:
    k = tuple(sorted([x["weather_1"], x["weather_2"]]))
    if d.get(k) is None:
        d[k] = x
list(d.values())
# [{'weather_1': 'cold', 'weather_2': 'hot', 'name': 'james'},
#  {'weather_1': 'really cold', 'weather_2': 'cold', 'name': 'james'},
#  {'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'}]

CodePudding user response:

Another solution is to use a set to remember the keys you've already seen. This lets you produce a generator instead of a list:

def find_unique_entries(entries):
    already_seen = set()
    for entry in entries:
        key = frozenset((entry["weather_1"], entry["weather_2"]))
        if key not in already_seen:
            already_seen.add(key)
            yield entry

If you need to do this in different scenarios, use more_itertools.all_unique, or roll your own general function:

def all_unique(iterable, key=lambda x: x):
    seen = set()
    for item in iterable:
        k = key(item)
        if k not in seen:
            seen.add(k)
            yield item
    
def find_unique_entries(entries):
    return all_unique(entries, key=lambda e: frozenset((e["weather_1"], e["weather_2"])))
  • Related