I have the following models:
class Shelf(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
name = models.CharField(max_length=200, db_index=True)
slug = models.SlugField(max_length=200, editable=False)
games = models.ManyToManyField(Game, blank=True, through='SortedShelfGames')
objects = ShelfManager()
description = models.TextField(blank=True, null=True)
class SortedShelfGames(models.Model):
game = models.ForeignKey(Game, on_delete=models.CASCADE)
shelf = models.ForeignKey(Shelf, on_delete=models.CASCADE)
date_added = models.DateTimeField()
order = models.IntegerField(blank=True, null=True)
releases = models.ManyToManyField(Release)
objects = SortedShelfGamesManager.as_manager()
class Game(models.Model):
name = models.CharField(max_length=300, db_index=True)
sort_name = models.CharField(max_length=300, db_index=True)
...
I have a view where I want to get all of a user's SortedShelfGames, distinct on the Game relationship. I then want to be able to sort that list of SortedShelfGames on a few different fields. So right now, I'm doing the following inside of the SortedShelfGamesManager (which inherits from models.QuerySet) to get the list:
games = self.filter(
pk__in=Subquery(
self.filter(shelf__user=user).distinct('game').order_by('game', 'date_added').values('pk') # The order_by statement in here is to get the earliest date_added field for display
)
)
That works the way it's supposed to. However, whenever I try and do an order_by('game__sort_name'), the query takes forever in my python. When I'm actually trying to use it on my site, it just times out. If I take the generated SQL and just run it on my database, it returns all of my results in a fraction of a second. I can't figure out what I'm doing wrong here. The SortedShelfGames table has millions of records in it if that matters.
This is the generated SQL:
SELECT
"collection_sortedshelfgames"."id", "collection_sortedshelfgames"."game_id", "collection_sortedshelfgames"."shelf_id", "collection_sortedshelfgames"."date_added", "collection_sortedshelfgames"."order",
(SELECT U0."rating" FROM "reviews_review" U0 WHERE (U0."game_id" = "collection_sortedshelfgames"."game_id" AND U0."user_id" = 1 AND U0."main") LIMIT 1) AS "score",
"games_game"."id", "games_game"."created", "games_game"."last_updated", "games_game"."exact", "games_game"."date", "games_game"."year", "games_game"."quarter", "games_game"."month", "games_game"."name", "games_game"."sort_name", "games_game"."rating_id", "games_game"."box_art", "games_game"."description", "games_game"."slug", "games_game"."giantbomb_id", "games_game"."ignore_giantbomb", "games_game"."ignore_front_page", "games_game"."approved", "games_game"."user_id", "games_game"."last_edited_by_id", "games_game"."dlc", "games_game"."parent_game_id"
FROM
"collection_sortedshelfgames"
INNER JOIN
"games_game"
ON
("collection_sortedshelfgames"."game_id" = "games_game"."id")
WHERE
"collection_sortedshelfgames"."id"
IN (
SELECT
DISTINCT ON (U0."game_id") U0."id"
FROM
"collection_sortedshelfgames" U0
INNER JOIN
"collection_shelf" U1 ON (U0."shelf_id" = U1."id")
WHERE
U1."user_id" = 1
ORDER
BY U0."game_id" ASC, U0."date_added" ASC
)
ORDER BY
"games_game"."sort_name" ASC
CodePudding user response:
I think you don't need a Subquery
for this.
CodePudding user response:
Here's what I ended up doing to solve this. Instead of using a Subquery, I created a list of primary keys by evaluating what I was using as the Subquery in, then feeding that into my query. It looks like this:
pks = list(self.filter(shelf__user=user).distinct('game').values_list('pk', flat=True))
games = self.filter(
pk__in=pks)
)
games = games.order_by('game__sort_name')
This ended up being pretty fast. This is essentially the same thing as the Subquery method, but whatever was going on underneath the hood in python/Django was slowing this way down.