If I have a dictionary with data in it like below what process should i enact like an if statement to delete duplicate entries such as nested dictionary 1 and 4. Lets say i wanted to delete 4 because the user entered it and i'm assuming that people are unique so they can't have the same demographics there can't be two John R. Smiths.
people = {1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'},
2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}
3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'},
4: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}}
I am just learning so i wouldn't be surprised if there is something simple I was unable to come up with. I attempted to compare the entries such as if ['1']['name'] and ['1']['sex'] == ['4']['name'] and ['4']['sex']: then print['4'] just to test and the error message told me that I need to be using indexes. I've also turned it into a list which was successfull but was met with another error when trying to compare them in a manner like if person['name'] and person['age'] and person['sex'] is equal to another row within a four loop than print a message and i got nowhere. I've also tried to turn it into a dataframe and use pandas duplicate function to remove the duplicates in which I got some error yesterday about 'dict' probably because the dictionaries get nested in the dataframe contrasting to a list with nested dictionaries which tends to look like this:
[{1: {'name': 'John', 'age': '27', 'sex': 'Male'},
2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}]
CodePudding user response:
You can take advantage of the fact that dict keys are always unique to help de-duplicate. Since dicts are unhashable and can't be used as keys directly, you can convert each sub-dict to a tuple of items first. Use dict.setdefault
to keep only the first value for each distinct key:
records = {}
for number, record in people.items():
records.setdefault(tuple(record.items()), (number, record))
print(dict(records.values()))
Given your sample input, this outputs:
{1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}, 2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}, 3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'}}
Demo: https://replit.com/@blhsing/LonelyNumbWatch
CodePudding user response:
One approach is to build a new dictionary by iterating over people
and assigning a person to the new dictionary if their data is unique. The following solution uses a set
for tracking unique users:
from pprint import pprint
unique_people = {}
unique_ids = set()
for key, data in people.items():
data_id = tuple(data.values())
if data_id in unique_ids:
continue
unique_people[key] = data
unique_ids.add(data_id)
pprint(unique_people)
Output:
{1: {'age': '27', 'name': 'John R. Smith', 'sex': 'Male'},
2: {'age': '22', 'name': 'Marie', 'sex': 'Female'},
3: {'age': '32', 'name': 'Mariah', 'sex': 'Female'}}