I have a two list of dictionaries, and I'm trying remove duplicates in their structure.
My list codes have the keys code_id and groups:
codes = [{'code_id': '57025', 'groups': '1234'},
{'code_id': '57025', 'groups': '4567'},
{'code_id': '57025', 'groups': '8910'},
{'code_id': '1', 'groups': '4321'},
{'code_id': '1', 'groups': '9876'}]
For each register in my dictionary list, my code_id is attached with one or many groups.
My data_master have keys code_ids, groups and more data related with my register with same code_id.
print(data_master)
But in this context, is considered duplicates:
Output:
[{'code_id': '57025',
'groups': '1234',
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
{'code_id': '57025',
'groups': '4567',
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
{'code_id': '1',
'groups': '4321',
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''},
{'code_id': '1',
'groups': '9876',
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''}
]
In this result, for each register with the same code_id in my list but with other group, is returning other dictionary structure.
I have tried in some many ways and that is the way I'm actually trying:
group_list = []
for item in data_master:
group_list.append(item['groups'])
for data in [data for data in codes if data['code_id'] == item['code_id'] and data['groups'] != item['groups']]:
item['groups'] = group_list
new_data_master = []
for data in data_master:
if (item["groups"] != data["groups"] for item in new_data_master):
new_data_master.append(data)
print(new_data_master)
Result:
[{'code_id': '57025',
'groups': ['1234', '4567', '4321', '9876'],
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
{'code_id': '57025',
'groups': ['1234', '4567', '4321', '9876'], 'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
{'code_id': '1',
'groups': ['1234', '4567', '4321', '9876'],
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''},
{'code_id': '1', 'groups': ['1234', '4567', '4321', '9876'],
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''}]
By this way, it is returning all another groups, that not necessarily is related with the code id.
For each code_id, I need return one dictionary with a group array. That is result I was expecting.
Expect Result:
[{'code_id': '57025',
'groups': ['1234','4567']
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''}
{'code_id': '1',
'groups': ['4321','9876']
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''}]
CodePudding user response:
I think the following should be working
keys = list(data_master[0].keys())
id_s = {}
blocks = []
for block in data_master:
code = block["code_id"]
if code in id_s.keys():
id_s[code].append(block.pop("groups")) # delete "code_id" to compare
else:
id_s[code] = [block.pop("groups")] # delete "code_id" to compare
b = dict(**block)
if b not in blocks: # since "code_id" was the only differetn now we can compare
blocks.append(b)
# add code_id for every block
new_master = [{key: block[key] if key != "groups" else id_s[block["code_id"]]
for key in keys} for block in blocks]
output
[{'code_id': '57025',
'groups': ['1234', '4567'],
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
{'code_id': '1',
'groups': ['4321', '9876'],
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''}]
CodePudding user response:
How about this?
Use 'code_id' for dictionary key.
And then convert dictionary to list when delete duplicate is finished.
Or you can use dictionary if you want.
data_master = [{'code_id': '57025',
'groups': '1234',
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
{'code_id': '57025',
'groups': '4567',
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
{'code_id': '1',
'groups': '4321',
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''},
{'code_id': '1',
'groups': '9876',
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''}
]
code_id_dict = {i['code_id']:{} for i in data_master}
for data in data_master:
for attr in data:
code_id_dict[data['code_id']][attr] = data[attr]
final_data = [code_id_dict[id] for id in code_id_dict]
print(final_data)
Output :
[{'code_id': '57025',
'groups': '4567',
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
{'code_id': '1',
'groups': '9876',
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''}]