I am struggling to create the correct prefetch behavior in Django. Here is the outline of the problem:
- Each Account has DailyQuotes, updated daily at different times (think snapshot)
- Need to query all of those DailyQuotes, and only get the most recent quotes for each account
Here are the models:
class Account(models.Model):
name = models.TextField(default="")
...
class DailyQuotes(models.Model):
account = models.ForeignKey(Account, related_name="quote", on_delete=models.CASCADE)
date = models.DateField(default=None)
...
Currently the query inside of my view for this looks like:
acc_ids = [1,2,3]
max_date = DailyQuotes.objects.aggregate(Max("date"))["date__max"]
accounts = (
Account.objects.filter(id__in=acc_ids)
.prefetch_related(
Prefetch(
"quote",
queryset=DailyQuotes.objects.filter(date=date),
),
)
)
# Feed into serializer, etc
This works and generates 3 queries: 1 for the max date, 1 for accounts, and 1 for the quotes. The problem with this solution is that if one account has more up to date DailyQuotes, then the other accounts will return no quotes. So I need to get the latest DailyQuotes for each account based on the max date for that account, not all accounts.
I have generated the SQL query that does what I want, but turning it into Django code has been giving me issues. I could execute the raw SQL but I would like to keep it in Django. Here is what the current SQL looks like and what it needs to be:
Current prefetch query (generated by Django):
SELECT ... FROM dailyquotes
WHERE (dailyquotes.date = 2022-05-05
AND dailyquotes.account_id IN (1,2,3))
Required prefetch query (or similar):
SELECT ... FROM dailyquotes dq
JOIN (SELECT account_id, MAX(date) AS date__max FROM dailyquotes
WHERE account_id in (1,2,3) group by account_id) dates
ON dq.account_id = dates.account_id AND dq.date = dates.date__max
Any help would be greatly appreciated!
EDIT:
With some help from SamSparx, here is what I came up with:
acc_ids = [1,2,3]
max_dates = (DailyQuotes.objects.filter(account_id__in=acc_ids)
.values("account_id")
.annotate(max_date=Max("date")))
recordsets = None
# get the recordsets
for max_date in max_dates:
qs = DailyQuotes.objects.filter(
account_id=max_date["account_id"],
date=max_date["max_date"])
if recordsets is None:
recordsets = qs
else:
recordsets = recordsets | qs
accounts = (Account.objects.filter(
id__in=permissions["acc_ids"].values())
.prefetch_related(
Prefetch(
"quote",
queryset=recordsets,
),
)
)
CodePudding user response:
If you are not tied to prefetch_related you can do it in Django via DailyQuotes in 2 calls - 1 to gather the max dates and 1 for the final recordset (even using select_related if you want accompanying account info).
from django.db.models import Max
#define lists
acc_ids = [0,1,2]
max_dates = []
recordsets = []
final_recordset = []
#get the max date for each account ID
max_dates = DailyQuotes.objects.filter(account_id__in=acc_ids).values('account_id').annotate(max_date = Max('date'))
#get the recordsets
for max_date in max_dates:
qs = DailyQuotes.objects.filter(account_id = max_date['account_id'], date = max_date['max_date'] )
#qs = DailyQuotes.objects.filter(account_id = max_date['account_id'], date = max_date['max_date']).select_related('account') if you need associated account info
recordsets.append(qs)
#combine the recordsets for serialising - you may want to modify this based on length of recordsets list (in case of empty accounts) for robustness
final_recordset = recordsets[0].union( recordsets[1], recordsets[2])