I would like to filter a list of dictionaries for duplicate elements. A duplicate element is based on the combination of two specific key values (weather_1
and weather_2
) in the dict
if they are the same: i.e.
[{'weather_1': 'cold', 'weather_2': 'hot', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'cold', 'name': 'james'},
{'weather_1': 'really cold', 'weather_2': 'cold', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'}]
->
[{'weather_1': 'cold', 'weather_2': 'hot', 'name': 'james'},
{'weather_1': 'really cold', 'weather_2': 'cold', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'}]
CodePudding user response:
Extract weather_1
and weather_2
from each element, sort those two values, and use that as the key to store the elements as values in a dict.
arr = [{'weather_1': 'cold', 'weather_2': 'hot', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'cold', 'name': 'james'},
{'weather_1': 'really cold', 'weather_2': 'cold', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'},
{'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'}]
d = {}
for x in arr:
k = tuple(sorted([x["weather_1"], x["weather_2"]]))
if d.get(k) is None:
d[k] = x
list(d.values())
# [{'weather_1': 'cold', 'weather_2': 'hot', 'name': 'james'},
# {'weather_1': 'really cold', 'weather_2': 'cold', 'name': 'james'},
# {'weather_1': 'hot', 'weather_2': 'really cold', 'name': 'james'}]
CodePudding user response:
Another solution is to use a set to remember the keys you've already seen. This lets you produce a generator instead of a list:
def find_unique_entries(entries):
already_seen = set()
for entry in entries:
key = frozenset((entry["weather_1"], entry["weather_2"]))
if key not in already_seen:
already_seen.add(key)
yield entry
If you need to do this in different scenarios, use more_itertools.all_unique, or roll your own general function:
def all_unique(iterable, key=lambda x: x):
seen = set()
for item in iterable:
k = key(item)
if k not in seen:
seen.add(k)
yield item
def find_unique_entries(entries):
return all_unique(entries, key=lambda e: frozenset((e["weather_1"], e["weather_2"])))