Home > other >  Django order_by query runs incredibly slow in Python, but fast in DB
Django order_by query runs incredibly slow in Python, but fast in DB

Time:02-01

I have the following models:

class Shelf(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    name = models.CharField(max_length=200, db_index=True)
    slug = models.SlugField(max_length=200, editable=False)
    games = models.ManyToManyField(Game, blank=True, through='SortedShelfGames')
    objects = ShelfManager()
    description = models.TextField(blank=True, null=True)

class SortedShelfGames(models.Model):
    game = models.ForeignKey(Game, on_delete=models.CASCADE)
    shelf = models.ForeignKey(Shelf, on_delete=models.CASCADE)
    date_added = models.DateTimeField()
    order = models.IntegerField(blank=True, null=True)
    releases = models.ManyToManyField(Release)
    objects = SortedShelfGamesManager.as_manager()

class Game(models.Model):
    name = models.CharField(max_length=300, db_index=True)
    sort_name = models.CharField(max_length=300, db_index=True)
    ...

I have a view where I want to get all of a user's SortedShelfGames, distinct on the Game relationship. I then want to be able to sort that list of SortedShelfGames on a few different fields. So right now, I'm doing the following inside of the SortedShelfGamesManager (which inherits from models.QuerySet) to get the list:

games = self.filter(
   pk__in=Subquery(
      self.filter(shelf__user=user).distinct('game').order_by('game', 'date_added').values('pk') # The order_by statement in here is to get the earliest date_added field for display
   )
)

That works the way it's supposed to. However, whenever I try and do an order_by('game__sort_name'), the query takes forever in my python. When I'm actually trying to use it on my site, it just times out. If I take the generated SQL and just run it on my database, it returns all of my results in a fraction of a second. I can't figure out what I'm doing wrong here. The SortedShelfGames table has millions of records in it if that matters.

This is the generated SQL:

SELECT 
    "collection_sortedshelfgames"."id", "collection_sortedshelfgames"."game_id", "collection_sortedshelfgames"."shelf_id", "collection_sortedshelfgames"."date_added", "collection_sortedshelfgames"."order", 
    (SELECT U0."rating" FROM "reviews_review" U0 WHERE (U0."game_id" = "collection_sortedshelfgames"."game_id" AND U0."user_id" = 1 AND U0."main") LIMIT 1) AS "score", 
    "games_game"."id", "games_game"."created", "games_game"."last_updated", "games_game"."exact", "games_game"."date", "games_game"."year", "games_game"."quarter", "games_game"."month", "games_game"."name", "games_game"."sort_name", "games_game"."rating_id", "games_game"."box_art", "games_game"."description", "games_game"."slug", "games_game"."giantbomb_id", "games_game"."ignore_giantbomb", "games_game"."ignore_front_page", "games_game"."approved", "games_game"."user_id", "games_game"."last_edited_by_id", "games_game"."dlc", "games_game"."parent_game_id" 
FROM 
    "collection_sortedshelfgames" 
INNER JOIN 
    "games_game" 
ON 
    ("collection_sortedshelfgames"."game_id" = "games_game"."id") 
WHERE 
    "collection_sortedshelfgames"."id" 
IN (
    SELECT 
        DISTINCT ON (U0."game_id") U0."id" 
    FROM 
        "collection_sortedshelfgames" U0 
    INNER JOIN 
        "collection_shelf" U1 ON (U0."shelf_id" = U1."id") 
    WHERE 
        U1."user_id" = 1 
    ORDER 
        BY U0."game_id" ASC, U0."date_added" ASC
) 
ORDER BY
    "games_game"."sort_name" ASC

CodePudding user response:

I think you don't need a Subquery for this.

CodePudding user response:

Here's what I ended up doing to solve this. Instead of using a Subquery, I created a list of primary keys by evaluating what I was using as the Subquery in, then feeding that into my query. It looks like this:

pks = list(self.filter(shelf__user=user).distinct('game').values_list('pk', flat=True))
games = self.filter(
   pk__in=pks)
)
games = games.order_by('game__sort_name')

This ended up being pretty fast. This is essentially the same thing as the Subquery method, but whatever was going on underneath the hood in python/Django was slowing this way down.

  •  Tags:  
  • Related