How can i compare and remove nested dictionaries with the same values within the same dictionary?-CodePudding

If I have a dictionary with data in it like below what process should i enact like an if statement to delete duplicate entries such as nested dictionary 1 and 4. Lets say i wanted to delete 4 because the user entered it and i'm assuming that people are unique so they can't have the same demographics there can't be two John R. Smiths.

people = {1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'},
          2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}
    3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'},
          4: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}}

I am just learning so i wouldn't be surprised if there is something simple I was unable to come up with. I attempted to compare the entries such as if ['1']['name'] and ['1']['sex'] == ['4']['name'] and ['4']['sex']: then print['4'] just to test and the error message told me that I need to be using indexes. I've also turned it into a list which was successfull but was met with another error when trying to compare them in a manner like if person['name'] and person['age'] and person['sex'] is equal to another row within a four loop than print a message and i got nowhere. I've also tried to turn it into a dataframe and use pandas duplicate function to remove the duplicates in which I got some error yesterday about 'dict' probably because the dictionaries get nested in the dataframe contrasting to a list with nested dictionaries which tends to look like this:

[{1: {'name': 'John', 'age': '27', 'sex': 'Male'},
  2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}]

CodePudding user response：

You can take advantage of the fact that dict keys are always unique to help de-duplicate. Since dicts are unhashable and can't be used as keys directly, you can convert each sub-dict to a tuple of items first. Use dict.setdefault to keep only the first value for each distinct key:

records = {}
for number, record in people.items():
    records.setdefault(tuple(record.items()), (number, record))
print(dict(records.values()))

Given your sample input, this outputs:

{1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}, 2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}, 3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'}}

Demo: https://replit.com/@blhsing/LonelyNumbWatch

CodePudding user response：

One approach is to build a new dictionary by iterating over people and assigning a person to the new dictionary if their data is unique. The following solution uses a set for tracking unique users:

from pprint import pprint

unique_people = {}
unique_ids = set()

for key, data in people.items():
    data_id = tuple(data.values())
    if data_id in unique_ids:
        continue
    unique_people[key] = data
    unique_ids.add(data_id)

pprint(unique_people)

Output:

{1: {'age': '27', 'name': 'John R. Smith', 'sex': 'Male'},
 2: {'age': '22', 'name': 'Marie', 'sex': 'Female'},
 3: {'age': '32', 'name': 'Mariah', 'sex': 'Female'}}