Home > OS >  How get extract unique dictionary from list of dictionary with preference of value
How get extract unique dictionary from list of dictionary with preference of value

Time:02-25

I have dictionary below

test = [ { 'id': '195', 'Name': 'i', 'Email': '[email protected]', 'role': 'Product' }, 
        { 'id': '219', 'Name': 'umar', 'Email': '[email protected]', 'role': 'Product' }, 
        { 'id': '74', 'Name': 'Are', 'Email': '[email protected]', 'role': 'Tester' },
        { 'id': '24', 'Name': 'Mee', 'Email': '[email protected]', 'role': 'Tester' },
        { 'id': '230', 'Name': 'abc', 'Email': '[email protected]', 'role': 'Tester' },
        { 'id': '220', 'Name': 'Sc', 'Email': '[email protected]', 'role': 'Product' },
        { 'id': '230', 'Name': 'Sn', 'Email': '[email protected]', 'role': 'Tester' } ] 
  • I need to extract unique email from above list dict
  • I need to give give role preference Product then to Tester

My Code is below

dict([(d['Email'], d) for d in test]).values()

My Out:

dict_values([{'id': '195', 'Name': 'i', 'Email': '[email protected]', 'role': 'Product'}, 
{'id': '219', 'Name': 'umar', 'Email': '[email protected]', 'role': 'Product'}, 
{'id': '74', 'Name': 'Are', 'Email': '[email protected]', 'role': 'Tester'}, 
{'id': '24', 'Name': 'Mee', 'Email': '[email protected]', 'role': 'Tester'}, 
{'id': '230', 'Name': 'Sn', 'Email': '[email protected]', 'role': 'Tester'}])

Here in my out

{'id': '230', 'Name': 'Sn', 'Email': '[email protected]', 'role': 'Tester'}

has to replace with

{ 'id': '220', 'Name': 'Sc', 'Email': '[email protected]', 'role': 'Product' }

because "Product" have higher preference.

How to update my code? dict([(d['Email'], d) for d in test]).values()

CodePudding user response:

Here is in case you would like to insist on using dictionaries. We go from one row to another. Check if the email is already in the new dictionary as key.

  • If not, we add this as a new one.
  • If so, we check our new row. If our new role is "product", we will delete what was already in the dictionary, and add the new row.
new_dict = {}
for row in test:
    if row["Email"] not in new_dict.keys():
        new_dict.update({row["Email"]: row})
    else:
        if row["role"]=="Product":
            new_dict.pop(row["Email"])
            new_dict.update({row["Email"]: row})

CodePudding user response:

Perhaps you could try it with two loops; once to get the unique emails, and second time to make sure to prioritize "Product".

It wasn't clear what happens if there is no "Product" for duplicate "Emails", so in the loop below, the first email is selected in that case.

tmp = {}
for d in test:
    tmp.setdefault(d['Email'], []).append(d)
    
out = []
for k, lst in tmp.items():
    if len(lst) == 1:
        out.append(lst[0])
    else:
        for d in lst:
            if d['role'] == 'Product':
                out.append(d)
                break
        else:
            out.append(lst[0])

Output:

[{'id': '195', 'Name': 'i', 'Email': '[email protected]', 'Account': 'Product'},
 {'id': '219', 'Name': 'umar', 'Email': '[email protected]', 'Account': 'Product'},
 {'id': '74', 'Name': 'Are', 'Email': '[email protected]', 'role': 'Tester'},
 {'id': '24', 'Name': 'Mee', 'Email': '[email protected]', 'role': 'Tester'},
 {'id': '220', 'Name': 'Sc', 'Email': '[email protected]', 'role': 'Product'}]

CodePudding user response:

Make it to a data frame and drop_duplicates by Email after sorting the column role.

test = [ { 'id': '195', 'Name': 'i', 'Email': '[email protected]', 'role': 'Product' }, 
        { 'id': '219', 'Name': 'umar', 'Email': '[email protected]', 'role': 'Product' }, 
        { 'id': '74', 'Name': 'Are', 'Email': '[email protected]', 'role': 'Tester' },
        { 'id': '24', 'Name': 'Mee', 'Email': '[email protected]', 'role': 'Tester' },
        { 'id': '230', 'Name': 'abc', 'Email': '[email protected]', 'role': 'Tester' },
        { 'id': '220', 'Name': 'Sc', 'Email': '[email protected]', 'role': 'Product' },
        { 'id': '230', 'Name': 'Sn', 'Email': '[email protected]', 'role': 'Tester' } ] 

df = pd.DataFrame(test)


df1 = df.sort_values(by = ["Email", "role"], ascending = True)
res_df = df1.drop_duplicates(["Email"])

output_list = []
for i in res_df.values :
    output_list.append(dict([("id", i[0]), ("Name", i[1]), ("Email", i[2]), ("role", i[3])]))

> output_list

[{'id': '195', 'Name': 'i', 'Email': '[email protected]', 'role': 'Product'},
 {'id': '219', 'Name': 'umar', 'Email': '[email protected]', 'role': 'Product'},
 {'id': '74', 'Name': 'Are', 'Email': '[email protected]', 'role': 'Tester'},
 {'id': '220', 'Name': 'Sc', 'Email': '[email protected]', 'role': 'Product'},
 {'id': '24', 'Name': 'Mee', 'Email': '[email protected]', 'role': 'Tester'}]

  • Related