I tried to remove duplicates from the list of pydantic objects, but faced a problem that I could not solve. The only working method is very slow.
Is there a faster way to remove duplicates than my method?
Code:
Pydantic model (a.py)
from pydantic import BaseModel
class Photo(BaseModel):
title: str
url: str
Main file (b.py)
from collections import OrderedDict
from a import Photo
# 3 objects, 2 duplicates
a_obj = {
'title': 'SOME TITLE v1',
'url': 'http://some.url'
}
b_obj = {
'title': 'SOME TITLE v2',
'url': 'http://different.url'
}
c_obj = {
'title': 'SOME TITLE v1',
'url': 'http://some.url'
}
# Creating list of pydantic objects
pd_obj_list = list()
pd_obj_list = [Photo(**a_obj)]
pd_obj_list = [Photo(**b_obj)]
pd_obj_list = [Photo(**c_obj)]
# My Attempts to Remove Duplicates
# Using OrderedDict.fromkeys
final_list_0 = list(OrderedDict.fromkeys(pd_obj_list))
# returns TypeError: unhashable type: 'Photo'
# Using Set
final_list_1 = list(set(pd_obj_list))
# returns TypeError: unhashable type: 'Photo'
# Using enumerate
final_list_2 = [i for n, i in enumerate(pd_obj_list) if i not in pd_obj_list[:n]]
# It works but too slow when I have ~10k objects in the list
CodePudding user response:
Use:
pd_obj_list = [Photo(**a_obj), Photo(**b_obj), Photo(**c_obj)]
final_list_0 = list(OrderedDict(((photo.title, photo.url), photo) for photo in pd_obj_list).values())
print(final_list_0)
Output
[Photo(title='SOME TITLE v1', url='http://some.url'), Photo(title='SOME TITLE v2', url='http://different.url')]
If Photo is inmutable you could define __hash__
as follows:
from collections import OrderedDict
from pydantic import BaseModel
class Photo(BaseModel):
title: str
url: str
def __hash__(self):
return hash((self.title, self.url))
# 3 objects, 2 duplicates
a_obj = {
'title': 'SOME TITLE v1',
'url': 'http://some.url'
}
b_obj = {
'title': 'SOME TITLE v2',
'url': 'http://different.url'
}
c_obj = {
'title': 'SOME TITLE v1',
'url': 'http://some.url'
}
pd_obj_list = [Photo(**a_obj), Photo(**b_obj), Photo(**c_obj)]
final_list_0 = list(OrderedDict.fromkeys(pd_obj_list))
print(final_list_0)
Output
[Photo(title='SOME TITLE v1', url='http://some.url'), Photo(title='SOME TITLE v2', url='http://different.url')]