Home > Software engineering >  Django: It takes a long time to filter the m2m model from the m2m-connected model by specifying the
Django: It takes a long time to filter the m2m model from the m2m-connected model by specifying the

Time:12-22

The m2m through table has about 1.4 million rows.

The slowdown is probably due to the large number of rows, but I'm sure I'm writing the queryset correctly. What do you think is the cause?

It will take about 400-1000ms.

If you do filter by pk instead of name, it will not be that slow.

# models.py
class Tag(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    name = models.CharField(unique=True, max_length=30)
    created_at = models.DateTimeField(default=timezone.now)


class Video(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    title = models.CharField(max_length=300)
    thumbnail_url = models.URLField(max_length=1000)
    preview_url = models.URLField(max_length=1000, blank=True, null=True)
    embed_url = models.URLField(max_length=1000)
    sources = models.ManyToManyField(Source)
    duration = models.CharField(max_length=6)
    tags = models.ManyToManyField(Tag, blank=True, db_index=True)
    views = models.PositiveIntegerField(default=0, db_index=True)
    is_public = models.BooleanField(default=True)
    published_at = models.DateTimeField(default=timezone.now, db_index=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)
Video.objects.filter(tags__name='word').only('id').order_by('-published_at');

Query issued

SELECT "videos_video"."id"
FROM "videos_video"
INNER JOIN "videos_video_tags" ON ("videos_video"."id" = "videos_video_tags"."video_id")
INNER JOIN "videos_tag" ON ("videos_video_tags"."tag_id" = "videos_tag"."id")
WHERE "videos_tag"."name" = 'word'
ORDER BY "videos_video"."published_at" DESC;

EXPLAIN(ANALYZE, VERBOSE, BUFFERS)

                                                                                                                                       QUERY PLAN                                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4225.63..4226.23 rows=241 width=24) (actual time=456.321..473.827 rows=135178 loops=1)
   Output: videos_video.id, videos_video.published_at
   Sort Key: videos_video.published_at DESC
   Sort Method: external merge  Disk: 4504kB
   Buffers: shared hit=540568 read=11368, temp read=563 written=566
   ->  Nested Loop  (cost=20.45..4216.10 rows=241 width=24) (actual time=5.538..398.841 rows=135178 loops=1)
         Output: videos_video.id, videos_video.published_at
         Inner Unique: true
         Buffers: shared hit=540568 read=11368
         ->  Nested Loop  (cost=20.02..4102.13 rows=241 width=16) (actual time=5.513..76.291 rows=135178 loops=1)
               Output: videos_video_tags.video_id
               Buffers: shared hit=2 read=11222
               ->  Index Scan using videos_tag_name_620230b0_like on public.videos_tag  (cost=0.28..8.30 rows=1 width=16) (actual time=0.020..0.022 rows=1 loops=1)
                     Output: videos_tag.id, videos_tag.name, videos_tag.is_actress, videos_tag.created_at
                     Index Cond: ((videos_tag.name)::text = 'word'::text)
                     Buffers: shared hit=1 read=2
               ->  Bitmap Heap Scan on public.videos_video_tags  (cost=19.74..4079.23 rows=1460 width=32) (actual time=5.489..62.122 rows=135178 loops=1)
                     Output: videos_video_tags.id, videos_video_tags.video_id, videos_video_tags.tag_id
                     Recheck Cond: (videos_video_tags.tag_id = videos_tag.id)
                     Heap Blocks: exact=11112
                     Buffers: shared hit=1 read=11220
                     ->  Bitmap Index Scan on videos_video_tags_tag_id_2673cfc8  (cost=0.00..19.38 rows=1460 width=0) (actual time=4.215..4.215 rows=135178 loops=1)
                           Index Cond: (videos_video_tags.tag_id = videos_tag.id)
                           Buffers: shared hit=1 read=108
         ->  Index Scan using videos_video_pkey on public.videos_video  (cost=0.42..0.47 rows=1 width=24) (actual time=0.002..0.002 rows=1 loops=135178)
               Output: videos_video.id, videos_video.title, videos_video.thumbnail_url, videos_video.preview_url, videos_video.embed_url, videos_video.duration, videos_video.views, videos_video.is_public, videos_video.published_at, videos_video.created_at, videos_video.updated_at
               Index Cond: (videos_video.id = videos_video_tags.video_id)
               Buffers: shared hit=540566 read=146
 Planning:
   Buffers: shared hit=33 read=13
 Planning Time: 0.991 ms
 Execution Time: 481.274 ms
(32 rows)

Time: 482.869 ms

CodePudding user response:

Did your database got EXACTLY thoses indexes :

  1. "videos_tag" ("name", "id")
  2. "videos_video_tags" ("tag_id", "video_id")
  3. "videos_video" ("id", "published_at")

If not, try it !

CodePudding user response:

I solved the problem using the method described in Iain Shelvington's comment.

Tag.objects.get(name='word').video_set.order_by('-published_at')
  • Related