Extracting duplicates from a list of dictionaries in Python-CodePudding

I have a huge list of dictionaries (I have shortened it here for clarity), where some values are duplicates (let's assume 'ID' is my target). How can I print the dictionary/ies where the ID occurs more than once?

[{'ID': 2501,
  'First Name': 'Edward',
  'Last Name': 'Crawford',
  'Email': '[email protected]',
  'Location': '[1.24564352 0.94323637]',
  'Registration': '12/12/2000',
  'Phone': '398-2890-30'},
 {'ID': 3390936,
  'First Name': 'Pepe',
  'Last Name': 'Slim',
  'Email': '[email protected]',
  'Location': '[1.7297525  0.54631239]',
  'Registration': '3/8/2020',
  'Phone': '341-3456-85'}]

I have only been able to print certain values from the list of dict, but unable to parse through and identify duplicates.

all_phone = [i['Phone'] for i in comments]
all_email = [i['Email'] for i in comments]

CodePudding user response：

I'd suggest constructing a helper function where you have the flexibility of choosing the field that you're looking for duplicates in. Incorporating an intermediate dictionary (such as that from @Andrej Kesely's answer) is an efficient way of searching for duplicates, and this can be generalized in a function. In this case I've used a simple dictionary rather than Counter from the collections library.

def find_duplicates(dicts, field):
    counts = {}
    for d in dicts:
        counts[d[field]] = counts.get(d[field], 0)   1
    return [d for d in dicts if counts[d[field]]>1]

phone_duplicates = find_duplicates(comments, 'Phone')

CodePudding user response：

You can use collections.Counter to create a counter where keys will be IDs from your dictionary. Then you can filter your list according this counter:

lst = [
    {
        "ID": 2501,
        "First Name": "Edward",
        "Last Name": "Crawford",
        "Email": "[email protected]",
        "Location": "[1.24564352 0.94323637]",
        "Registration": "12/12/2000",
        "Phone": "398-2890-30",
    },
    {
        "ID": 3390936,
        "First Name": "Pepe",
        "Last Name": "Slim",
        "Email": "[email protected]",
        "Location": "[1.7297525  0.54631239]",
        "Registration": "3/8/2020",
        "Phone": "341-3456-85",
    },
    # duplicate ID here:
    {
        "ID": 2501,
        "First Name": "XXX",
        "Last Name": "XXX",
    },
]

from collections import Counter

# create a counter:
c = Counter(d["ID"] for d in lst)

# print duplicated dictionaries:
for d in lst:
    if c[d["ID"]] > 1:
        print(d)

prints:

{
    "ID": 2501,
    "First Name": "Edward",
    "Last Name": "Crawford",
    "Email": "[email protected]",
    "Location": "[1.24564352 0.94323637]",
    "Registration": "12/12/2000",
    "Phone": "398-2890-30",
}
{"ID": 2501, "First Name": "XXX", "Last Name": "XXX"}

CodePudding user response：

You could loop through the list and create a new dictionary as you go and catch when you run into a duplicate

if key not in d:
    d[key] = value
else:
    # you have a duplicate

CodePudding user response：

Using a list comprehension:

comments=[{'ID': 1111,
  'First Name': 'foo1',
  'Last Name': 'bar1'},
 {'ID': 2222,
  'First Name': 'foo2',
  'Last Name': 'bar2'},
  {'ID': 1111,
  'First Name': 'foo3',
  'Last Name': 'bar3'},
   {'ID': 3333,
  'First Name': 'foo4',
  'Last Name': 'bar4'},
   {'ID': 2222,
  'First Name': 'foo5',
  'Last Name': 'bar5'},]
  
all_ID = [i['ID'] for i in comments]

Duplicates =list(set([x for x in all_ID if all_ID.count(x) > 1]))
    
print("Duplicates found! =>", Duplicates )

Output:

Duplicates found! => [2222, 1111]