See Filtering unique values for the problem description, sample data and postgres query. I'd like to convert the SQL to a queryset. I feel like I'm close but not quite.
SELECT Column_A, Column_B, Column_C, 0 as RN
FROM TABLE
WHERE COLUMN_C is null and Column_B in (UserA, UserB, UserC)
UNION ALL
SELECT Column_A, Column_B, Column_C, RN
FROM (
SELECT A.*, ROW_NUMBER() over (partition by A.column_C Order by case A.column_B when 'UserA' then 0 else 1 end, U.Time_Created) rn
FROM Table A
INNER JOIN user U
on U.Column_B = A.Column_B
WHERE A.Column_C is not null and ColumnB in (userA, userB, UserC)) B
WHERE RN = 1
This is what I have so far:
qs1 = Table.objects.filter(Column_C__isnull=True).annotate(rn=Value(0))
qs2 = Table.objects.annotate(rn=Window(
expression=RowNumber(),
partition_by=[Column_C],
order_by=[Case(When(Column_B=UserA, then=0), default=1), 'Table_for_Column_B__time_created']
)).filter(Column_C__isnull=False, rn=1)
return qs2.union(qs1)
This doesn't quite work.
django.db.utils.NotSupportedError: Window is disallowed in the filter clause.
Next, I tried pulling the intermediate result in a subquery, to allow for filtering in the outer query, since I only really need rows with row number = 1.
qs1 = Table.objects.filter(Column_C__isnull=True).annotate(rn=Value(0))
qs2 = Table.objects.annotate(rn=Window(
expression=RowNumber(),
partition_by=[Column_C],
order_by=[Case(When(Column_B=UserA, then=0), default=1), 'Table_for_Column_B__time_created']
)).filter(pk=OuterRef('pk'))
qs3 = Table.objects.annotate(rn=Subquery(qs2.values('rn'))).filter(Column_C__isnull=False, rn=1)
return qs3.union(q1)
No exceptions this time, but this doesn't work. Every row in the table gets row_number=1 annotated. From the original example, the queryset returns all 7 rows instead of filtering to 5.
- Is it possible to filter on window expressions?
- What's the best practices to keep in mind when converting window queries to subqueries?
- Is there a better way to structure the queryset?
CodePudding user response:
You should be able to do this without a window expression, using a SubQuery
First create a queryset for the subquery that orders by the Column_B=UserA
match and then time_created
from django.db.models import Case, When, Q, Subquery, OuterRef
tables_ordered = Table.objects.filter(
Column_C=OuterRef('Column_C')
).annotate(
user_match=Case(When(Column_B=UserA, then=0), default=1)
).order_by('user_match', 'time_created')
Then this subquery returns the first pk for the matched Column_C
from the OuterRef
, similar to selecting the first row from your window function
first_pk_for_each_column_c = Subquery(tables_ordered.values('pk')[:1])
Then use two Q objects to create an OR that selects the row if Column_C
is NULL or the pk
matches the first pk
from the subquery
Table.objects.filter(
Q(Column_C__isnull=True) | Q(pk=first_pk_for_each_column_c)
)