I have a huge list of dictionaries (I have shortened it here for clarity), where some values are duplicates (let's assume 'ID' is my target). How can I print the dictionary/ies where the ID occurs more than once?
[{'ID': 2501,
'First Name': 'Edward',
'Last Name': 'Crawford',
'Email': '[email protected]',
'Location': '[1.24564352 0.94323637]',
'Registration': '12/12/2000',
'Phone': '398-2890-30'},
{'ID': 3390936,
'First Name': 'Pepe',
'Last Name': 'Slim',
'Email': '[email protected]',
'Location': '[1.7297525 0.54631239]',
'Registration': '3/8/2020',
'Phone': '341-3456-85'}]
I have only been able to print certain values from the list of dict, but unable to parse through and identify duplicates.
all_phone = [i['Phone'] for i in comments]
all_email = [i['Email'] for i in comments]
CodePudding user response:
I'd suggest constructing a helper function where you have the flexibility of choosing the field that you're looking for duplicates in. Incorporating an intermediate dictionary (such as that from @Andrej Kesely's answer) is an efficient way of searching for duplicates, and this can be generalized in a function. In this case I've used a simple dictionary rather than Counter
from the collections library.
def find_duplicates(dicts, field):
counts = {}
for d in dicts:
counts[d[field]] = counts.get(d[field], 0) 1
return [d for d in dicts if counts[d[field]]>1]
phone_duplicates = find_duplicates(comments, 'Phone')
CodePudding user response:
You can use collections.Counter
to create a counter where keys will be ID
s from your dictionary. Then you can filter your list according this counter:
lst = [
{
"ID": 2501,
"First Name": "Edward",
"Last Name": "Crawford",
"Email": "[email protected]",
"Location": "[1.24564352 0.94323637]",
"Registration": "12/12/2000",
"Phone": "398-2890-30",
},
{
"ID": 3390936,
"First Name": "Pepe",
"Last Name": "Slim",
"Email": "[email protected]",
"Location": "[1.7297525 0.54631239]",
"Registration": "3/8/2020",
"Phone": "341-3456-85",
},
# duplicate ID here:
{
"ID": 2501,
"First Name": "XXX",
"Last Name": "XXX",
},
]
from collections import Counter
# create a counter:
c = Counter(d["ID"] for d in lst)
# print duplicated dictionaries:
for d in lst:
if c[d["ID"]] > 1:
print(d)
prints:
{
"ID": 2501,
"First Name": "Edward",
"Last Name": "Crawford",
"Email": "[email protected]",
"Location": "[1.24564352 0.94323637]",
"Registration": "12/12/2000",
"Phone": "398-2890-30",
}
{"ID": 2501, "First Name": "XXX", "Last Name": "XXX"}
CodePudding user response:
You could loop through the list and create a new dictionary as you go and catch when you run into a duplicate
if key not in d:
d[key] = value
else:
# you have a duplicate
CodePudding user response:
Using a list comprehension:
comments=[{'ID': 1111,
'First Name': 'foo1',
'Last Name': 'bar1'},
{'ID': 2222,
'First Name': 'foo2',
'Last Name': 'bar2'},
{'ID': 1111,
'First Name': 'foo3',
'Last Name': 'bar3'},
{'ID': 3333,
'First Name': 'foo4',
'Last Name': 'bar4'},
{'ID': 2222,
'First Name': 'foo5',
'Last Name': 'bar5'},]
all_ID = [i['ID'] for i in comments]
Duplicates =list(set([x for x in all_ID if all_ID.count(x) > 1]))
print("Duplicates found! =>", Duplicates )
Output:
Duplicates found! => [2222, 1111]