I have a list of dictionaries like so:
data = [{title:'XYZ',url:'www.xyz.com'},{title:'ABC',url:'www.abc.com'},{title:'XYZ',url:'www.def.com'}]
I would like to filter this so that only the non-duplicate titles are retained i.e.:
filtered = [{title:'XYZ',url:'www.xyz.com'},{title:'ABC',url:'www.abc.com'}]
It doesn't matter which duplicate dictionary is retained. I tried
[x for x in data if x['title'] not in data]
CodePudding user response:
As you are not concerned about which records to keep, following will keep the last records for the duplicates while keeping the non-duplicates:
[x for i, x in enumerate(data)
if x['title'] not in [item['title'] for item in data[i 1:]]]
# output:
[{'title': 'ABC', 'url': 'www.abc.com'}, {'title': 'XYZ', 'url': 'www.def.com'}]
The idea is to iterate the list using enumerate
then to check if the title exists in list after index i
, if not, it'll keep the item else it will skip the item.
CodePudding user response:
data = [{'title':'XYZ','url':'www.xyz.com'},{'title':'ABC','url':'www.abc.com'}, {'title':'XYZ','url':'www.def.com'}]
titles = []
i = 0
while len(data)>i:
d = data[i]
title = d['title']
if title in titles:
data.pop(i)
else:
titles.append(title)
i =1
print(data)
CodePudding user response:
hope this works
data =[{'title':'XYZ','url':'www.xyz.com'},{'title':'ABC','url':'www.abc.com'},{'title':'XYZ','url':'www.def.com'}]
tested_title=[]
filtered=[]
for test in data:
test_title=test["title"]
if test_title not in tested_title:
tested_title.append(test_title)
filtered.append(test)
print(filtered)