I have students associated to schools, and want to find all schools that have five or fewer (including zero) students that have has_mohawk = false
.
Here's an Activerecord query:
School.joins(:students)
.group(:id)
.having("count(students.id) < 5")
.where(students: {has_mohawk: true})
This works for schools with 1 - 4 such students with mohawks, but omits schools where there are no such students!
I figured out a working solution and will post it. But I am interested in a more elegant solution.
Using Rails 5. I'm also curious whether Rails 6's missing
could handle this?
CodePudding user response:
In addition to the first query, write another to find schools where no students have mohawks (works in Rails 5).
School.left_outer_joins(:students)
.group(:id)
.having("max(has_mohawk::Integer) = 0")
You might think from this popular answer that you could instead just write:
School.left_outer_joins(:students)
.group(:id)
.where.not(student: {has_mohawk: true})
But that will include (at least in Rails 5) schools where there is any student with a has_mohawk value of false, even if some students have a has_mohawk value of true.
Explanation of max(has_mohawk::Integer) = 0
It converts the boolean to an integer (1 for true, 0 for false). Schools with any true values will have a max of 1, and can thus be filtered out.
Similiar: SQL: Select records where ALL joined records satisfy some condition
CodePudding user response:
find all schools that have five or fewer (including zero) students that have
has_mohawk = false
.
Here is an optimized SQL solution. SQL is what it comes down to in any case. (ORMs like Active Record are limited in their capabilities.)
SELECT sc.*
FROM schools sc
LEFT JOIN (
SELECT school_id
FROM students
WHERE has_mohawk = false
GROUP BY 1
HAVING count(*) >= 5
) st ON st.school_id = sc.id
WHERE st.school_id IS NULL; -- "not disqualified"
While involving all rows, aggregate before joining. That's faster. This query takes the reverse approach by excluding schools with 5 or more qualifying students. The rest is your result - incl. schools with 0 qualifying students. See:
Any B-tree index on students (school_id)
can support this query, but this partial multicolumn index would be perfect:
CREATE INDEX ON students (school_id) WHERE has_mohawk = false;
If there can be many students per school, this is faster:
SELECT sc.*
FROM schools sc
JOIN LATERAL (
SELECT count(*) < 5 AS qualifies
FROM (
SELECT -- select list can be empty (cheapest)
FROM students st
WHERE st.school_id = sc.id
AND st.has_mohawk = false
LIMIT 5 -- !
) st1
) st2 ON st2.qualifies;
The point is not to count all qualifying students, but stop looking once we found 5. Since the join to the LATERAL
subquery with an aggregate function always returns a row (as opposed to the join in the first query), schools without qualifying students are kept in the loop, and we don't need the reverse approach.
About LATERAL
: