Home > database >  How to effectively iterate by many foreignkey objects with Django ORM?
How to effectively iterate by many foreignkey objects with Django ORM?

Time:11-01

Let's say that I have model like:

class User(AbstractUser):

  ...
  seller = models.ForeignKey(Seller, related_name="user", on_delete=models.SET_NULL, null=True)
  ...

And I trying to get seller email address using this code:

from app.models import User
from django_print_sql import print_sql

with print_sql(count_only=False):
  users = User.objects.filter(is_active=True, seller_id__isnull=False).select_related().only('seller__email')
  for u in users.iterator():
    email = u.seller.email
    send_email(email)

In this case I can see SQL queries like:

SELECT `user`.`id`,
       `user`.`seller_id`
FROM `user`
WHERE (`user`.`is_active`
       AND `user`.`seller_id` IS NOT NULL)

...

SELECT `seller`.`id`,
       ...
       `seller`.`email`,
       ...
FROM `seller`
WHERE `seller`.`id` = 1

...

The problem is Django ORM accessing DB at every iteration (Select seller... where seller.id = ...). So that will be too many queries (== DB connections) if we have many sellers.

In other way it is possible to replace only with values:

from app.models import User
from django_print_sql import print_sql

with print_sql(count_only=False):
  users = User.objects.filter(is_active=True, seller_id__isnull=False).select_related().values('seller__email')
  for u in users.iterator():
    email = u['seller__email']
    send_email(email)

And I can see SQL query like:

SELECT `seller`.`email`
FROM `user`
INNER JOIN `seller` ON (`user`.`seller_id` = `seller`.`id`)
WHERE (`user`.`is_active`
       AND `user`.`seller_id` IS NOT NULL)

It is little bit better and we able to get emails by single DB query, but iterator is useless here because we load all emails once inside dict object and whole data exists inside memory.

The question is
Is it possible to iterate by data chunks (Let say by 500 emails/single query) without create manual sub-loops limiting User.objects.filter query count? Or in other words, what is most effective iterator in this case?

CodePudding user response:

Looks like select_related is working for this case:

from app.models import User
from django_print_sql import print_sql

with print_sql(count_only=False):
  users = User.objects.filter(is_active=True, seller_id__isnull=False).select_related('seller').only('seller__email').exclude(seller__email__exact='')
  for u in users.iterator():
    send_email(u.seller.email)
SELECT `user`.`id`,
       `user`.`owner_id`,
       `seller`.`id`,
       `seller`.`email`
FROM `user`
INNER JOIN `seller` ON (`user`.`seller_id` = `seller`.`id`)
WHERE (`user`.`is_active`
       AND `user`.`seller_id` IS NOT NULL
       AND NOT (`seller`.`email` =
                AND `seller`.`email` IS NOT NULL))

CodePudding user response:

In Your case You give no parameters for select_related, I suggest You to put seller field to select_related like this:

from app.models import User
from django_print_sql import print_sql

with print_sql(count_only=False):
  users = User.objects.select_related('seller').filter(is_active=True, seller_id__isnull=False).only('seller__email')
  for u in users.iterator():
    email = u.seller.email
    send_email(email)

putting select_related before objects manager. By doing this django retreives seller model records as a field for every iterated user and will not hit the database in every iteration.

  • Related