Home > OS >  Remove duplicates with low score in list of dicts
Remove duplicates with low score in list of dicts

Time:02-19

I currently have the following list of dicts:

[
    {'id': '1', 'sim': 0.81},
    {'id': '1', 'sim': 0.72},
    {'id': '2', 'sim': 0.85},    
    {'id': '2', 'sim': 0.81},
    {'id': '2', 'sim': 0.72}
]

I'd like to remove the duplicates which have not the highest sim and get the following:

[
    {'id': '1', 'sim': 0.81},
    {'id': '2', 'sim': 0.85},
]

CodePudding user response:

One way to go would be to use pandas :

import pandas as pd

d = [
    {'id': '1', 'sim': 0.81},
    {'id': '1', 'sim': 0.72},
    {'id': '2', 'sim': 0.85},
    {'id': '2', 'sim': 0.81},
    {'id': '2', 'sim': 0.72}
]

df = pd.DataFrame(d)

df = df.groupby(['id'], sort=False)['sim'].max()

Then you can keep using it as a Dataframe, or going back to nested dictionnaries depending on what you need.

CodePudding user response:

sims_list = [
    {'id': '1', 'sim': 0.81},
    {'id': '1', 'sim': 0.72},
    {'id': '2', 'sim': 0.85},
    {'id': '2', 'sim': 0.81},
    {'id': '2', 'sim': 0.72}
]

result = []
for each_sim in sims_list:
    for each_result in result:
        if each_result["id"] == each_sim["id"]:
            each_result["sim"] = max(each_result["sim"], each_sim["sim"])
            break
    else:
        result.append(each_sim)

print(result)

Output

[{'id': '1', 'sim': 0.81}, {'id': '2', 'sim': 0.85}]
  • Related