Home > Software design >  How match data in list of dictionaries and remove duplicates?
How match data in list of dictionaries and remove duplicates?

Time:09-22

I have a two list of dictionaries, and I'm trying remove duplicates in their structure.

My list codes have the keys code_id and groups:

codes = [{'code_id': '57025', 'groups': '1234'}, 
{'code_id': '57025', 'groups': '4567'}, 
{'code_id': '57025', 'groups': '8910'},
{'code_id': '1', 'groups': '4321'},
{'code_id': '1', 'groups': '9876'}]

For each register in my dictionary list, my code_id is attached with one or many groups.

My data_master have keys code_ids, groups and more data related with my register with same code_id.

print(data_master)

But in this context, is considered duplicates:

Output:

[{'code_id': '57025', 
'groups': '1234', 
'initials': 'XXXXX', 
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET', 
'number_1': '',
'number_2': ''},

{'code_id': '57025', 
'groups': '4567', 
'initials': 'XXXXX', 
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET', 
'number_1': '',
'number_2': ''},

{'code_id': '1', 
'groups': '4321', 
'initials': 'YYYY', 
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER', 
'number_1': '',
'number_2': ''},

{'code_id': '1', 
'groups': '9876', 
'initials': 'YYYY', 
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER', 
'number_1': '',
'number_2': ''}
]

In this result, for each register with the same code_id in my list but with other group, is returning other dictionary structure.

I have tried in some many ways and that is the way I'm actually trying:

group_list = []
for item in data_master:
  group_list.append(item['groups']) 
  for data in [data for data in codes if data['code_id'] == item['code_id'] and data['groups'] != item['groups']]:
    item['groups'] = group_list


new_data_master = []

for data in data_master:
  if (item["groups"] != data["groups"] for item in new_data_master):
    new_data_master.append(data)

print(new_data_master)

Result:

[{'code_id': '57025', 
'groups': ['1234', '4567', '4321', '9876'],
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
 
{'code_id': '57025',
'groups': ['1234', '4567', '4321', '9876'], 'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},

{'code_id': '1',
'groups': ['1234', '4567', '4321', '9876'],
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''},

{'code_id': '1', 'groups': ['1234', '4567', '4321', '9876'],
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''}]

By this way, it is returning all another groups, that not necessarily is related with the code id.

For each code_id, I need return one dictionary with a group array. That is result I was expecting.

Expect Result:

[{'code_id': '57025', 
'groups': ['1234','4567'] 
'initials': 'XXXXX', 
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET', 
'number_1': '',
'number_2': ''}

{'code_id': '1', 
'groups': ['4321','9876'] 
'initials': 'YYYY', 
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER', 
'number_1': '',
'number_2': ''}]

CodePudding user response:

I think the following should be working


keys = list(data_master[0].keys())

id_s = {}
blocks = []
for block in data_master:
    code = block["code_id"]
    if code in id_s.keys():
        id_s[code].append(block.pop("groups")) # delete "code_id" to compare
    else:
        id_s[code] = [block.pop("groups")] # delete "code_id" to compare
    
    b = dict(**block)
    if b not in blocks: # since "code_id" was the only differetn now we can compare
        blocks.append(b)

# add code_id for every block        
new_master = [{key: block[key] if key != "groups" else id_s[block["code_id"]] 
               for key in keys} for block in blocks]

output

[{'code_id': '57025',
  'groups': ['1234', '4567'],
  'initials': 'XXXXX',
  'name': 'XXXX',
  'city': 'LOS SANTOS',
  'postal_code': '02938402-9093',
  'uf': 'US',
  'address': 'GROOVE STREET',
  'number_1': '',
  'number_2': ''},
 {'code_id': '1',
  'groups': ['4321', '9876'],
  'initials': 'YYYY',
  'name': 'YYYY',
  'city': 'GOTHAM',
  'postal_code': '930489038-5679',
  'uf': 'US',
  'address': 'WAYNE TOWER',
  'number_1': '',
  'number_2': ''}]

CodePudding user response:

How about this?
Use 'code_id' for dictionary key.
And then convert dictionary to list when delete duplicate is finished.
Or you can use dictionary if you want.

data_master = [{'code_id': '57025',
                'groups': '1234',
                'initials': 'XXXXX',
                'name': 'XXXX',
                'city': 'LOS SANTOS',
                'postal_code': '02938402-9093',
                'uf': 'US',
                'address': 'GROOVE STREET',
                'number_1': '',
                'number_2': ''},

               {'code_id': '57025',
                'groups': '4567',
                'initials': 'XXXXX',
                'name': 'XXXX',
                'city': 'LOS SANTOS',
                'postal_code': '02938402-9093',
                'uf': 'US',
                'address': 'GROOVE STREET',
                'number_1': '',
                'number_2': ''},

               {'code_id': '1',
                'groups': '4321',
                'initials': 'YYYY',
                'name': 'YYYY',
                'city': 'GOTHAM',
                'postal_code': '930489038-5679',
                'uf': 'US',
                'address': 'WAYNE TOWER',
                'number_1': '',
                'number_2': ''},

               {'code_id': '1',
                'groups': '9876',
                'initials': 'YYYY',
                'name': 'YYYY',
                'city': 'GOTHAM',
                'postal_code': '930489038-5679',
                'uf': 'US',
                'address': 'WAYNE TOWER',
                'number_1': '',
                'number_2': ''}
               ]

code_id_dict = {i['code_id']:{} for i in data_master}
for data in data_master:
    for attr in data:
        code_id_dict[data['code_id']][attr] = data[attr]

final_data = [code_id_dict[id] for id in code_id_dict]
print(final_data)

Output :

[{'code_id': '57025', 
   'groups': '4567', 
   'initials': 'XXXXX', 
   'name': 'XXXX', 
   'city': 'LOS SANTOS', 
   'postal_code': '02938402-9093', 
   'uf': 'US', 
   'address': 'GROOVE STREET', 
   'number_1': '', 
   'number_2': ''},
  {'code_id': '1', 
   'groups': '9876', 
   'initials': 'YYYY', 
   'name': 'YYYY', 
   'city': 'GOTHAM', 
   'postal_code': '930489038-5679', 
   'uf': 'US', 
   'address': 'WAYNE TOWER', 
   'number_1': '', 
   'number_2': ''}]
  • Related