I currently have the following list of dicts:
[
{'id': '1', 'sim': 0.81},
{'id': '1', 'sim': 0.72},
{'id': '2', 'sim': 0.85},
{'id': '2', 'sim': 0.81},
{'id': '2', 'sim': 0.72}
]
I'd like to remove the duplicates which have not the highest sim and get the following:
[
{'id': '1', 'sim': 0.81},
{'id': '2', 'sim': 0.85},
]
CodePudding user response:
One way to go would be to use pandas
:
import pandas as pd
d = [
{'id': '1', 'sim': 0.81},
{'id': '1', 'sim': 0.72},
{'id': '2', 'sim': 0.85},
{'id': '2', 'sim': 0.81},
{'id': '2', 'sim': 0.72}
]
df = pd.DataFrame(d)
df = df.groupby(['id'], sort=False)['sim'].max()
Then you can keep using it as a Dataframe, or going back to nested dictionnaries depending on what you need.
CodePudding user response:
sims_list = [
{'id': '1', 'sim': 0.81},
{'id': '1', 'sim': 0.72},
{'id': '2', 'sim': 0.85},
{'id': '2', 'sim': 0.81},
{'id': '2', 'sim': 0.72}
]
result = []
for each_sim in sims_list:
for each_result in result:
if each_result["id"] == each_sim["id"]:
each_result["sim"] = max(each_result["sim"], each_sim["sim"])
break
else:
result.append(each_sim)
print(result)
Output
[{'id': '1', 'sim': 0.81}, {'id': '2', 'sim': 0.85}]