Which of the following is more efficient in django if the length of ids
is not big and the length of queryset
might be too large?
items = [item for item in queryset.filter(id__in=ids)]
items = [queryset.get(id=id) for id in ids]
N.B: it's guaranteed that there is no such id that will mismatch. Also there might be several cases where length of ids is 1
CodePudding user response:
The query that uses .filter(…)
will be more efficient: this will perform one query on the database, whereas the solution with .get(…)
will run n queries on the database, with n the number of elements in ids
.
While that .filter(…)
query might take a bit longer, constructing the query, sending the query to the database, decoding the query, constructing an execution plan, and sending the result back will all be done once with .filter(…)
, whereas for the .get(…)
solution, this will be done n times.
Note that the two are semantically not equivalent: the .filter(…)
will retrieve all items for which an id
exists, the .get(…)
solution on the other hand will error in case it can not find an id.
In case the number of items to return is huge, you should not store these in a list, but work with an .iterator(…)
[Django-doc] to process the items in chunks, so:
for item in queryset.filter(id__in=ids).iterator():
# do something …
If you would store the items in a list, that would defeat the purpose of the iterator, since then all objects are all "alive" at the same time, and thus there should be enough memory to keep these in memory.
CodePudding user response:
If we want to measure the efficiency then we have to think about the query at first. Because both are doing list comprehension and we have nothing to do with that.
Now if we come to the queryset.filter
and queryset.get
and try to understand the equivalent SQL query.
- SQL query count: For
filter
there will be only one query but forget
there will ben
number of query. Even if then
is 2,filter
is more efficient thanget
. - Execution efficiency: This one also relay on SQL query. Because as the
get
implementation says it will create query and perform it each time. Butfilter
will create a single query and perform it for one time. And transnational execution(Making a query to DB) is always slower than doing it in the code.
You can see the SQL query of queryset.filter
by printing the query
attribute of it.
I think filter
is more efficient than get
in this case.