Home > Back-end >  Creating Custom Join in Django
Creating Custom Join in Django

Time:05-07

I am struggling to create the correct prefetch behavior in Django. Here is the outline of the problem:

  • Each Account has DailyQuotes, updated daily at different times (think snapshot)
  • Need to query all of those DailyQuotes, and only get the most recent quotes for each account

Here are the models:

class Account(models.Model):
    name = models.TextField(default="")
    ...

class DailyQuotes(models.Model):
    account = models.ForeignKey(Account, related_name="quote", on_delete=models.CASCADE)
    date = models.DateField(default=None)
    ...

Currently the query inside of my view for this looks like:

acc_ids = [1,2,3]
max_date = DailyQuotes.objects.aggregate(Max("date"))["date__max"]
accounts = (
            Account.objects.filter(id__in=acc_ids)
            .prefetch_related(
                Prefetch(
                    "quote",
                    queryset=DailyQuotes.objects.filter(date=date),
                ),
            )
        )
# Feed into serializer, etc

This works and generates 3 queries: 1 for the max date, 1 for accounts, and 1 for the quotes. The problem with this solution is that if one account has more up to date DailyQuotes, then the other accounts will return no quotes. So I need to get the latest DailyQuotes for each account based on the max date for that account, not all accounts.

I have generated the SQL query that does what I want, but turning it into Django code has been giving me issues. I could execute the raw SQL but I would like to keep it in Django. Here is what the current SQL looks like and what it needs to be:

Current prefetch query (generated by Django):

SELECT ... FROM dailyquotes
    WHERE (dailyquotes.date = 2022-05-05
           AND dailyquotes.account_id IN (1,2,3))

Required prefetch query (or similar):

SELECT ... FROM dailyquotes dq
    JOIN (SELECT account_id, MAX(date) AS date__max FROM dailyquotes
          WHERE account_id in (1,2,3) group by account_id) dates 
    ON dq.account_id = dates.account_id AND dq.date = dates.date__max

Any help would be greatly appreciated!

EDIT:

With some help from SamSparx, here is what I came up with:

acc_ids = [1,2,3]
max_dates = (DailyQuotes.objects.filter(account_id__in=acc_ids)
            .values("account_id")
            .annotate(max_date=Max("date")))

recordsets = None

# get the recordsets
for max_date in max_dates:
    qs = DailyQuotes.objects.filter(
        account_id=max_date["account_id"], 
        date=max_date["max_date"])
        
    if recordsets is None:
        recordsets = qs
    else:
        recordsets = recordsets | qs

accounts = (Account.objects.filter(
            id__in=permissions["acc_ids"].values())
            .prefetch_related(
                Prefetch(
                    "quote",
                    queryset=recordsets,
                ),
            )
        )

CodePudding user response:

If you are not tied to prefetch_related you can do it in Django via DailyQuotes in 2 calls - 1 to gather the max dates and 1 for the final recordset (even using select_related if you want accompanying account info).

from django.db.models import Max

#define lists
acc_ids = [0,1,2]
max_dates = []
recordsets = []
final_recordset = []

#get the max date for each account ID
max_dates = DailyQuotes.objects.filter(account_id__in=acc_ids).values('account_id').annotate(max_date = Max('date'))

#get the recordsets
for max_date in max_dates:
    qs = DailyQuotes.objects.filter(account_id = max_date['account_id'], date = max_date['max_date'] )
    #qs =  DailyQuotes.objects.filter(account_id = max_date['account_id'], date = max_date['max_date']).select_related('account') if you need associated account info
    recordsets.append(qs)
    
#combine the recordsets for serialising  - you may want to modify this based on length of recordsets list (in case of empty accounts) for robustness  
final_recordset = recordsets[0].union( recordsets[1], recordsets[2])
  • Related