Home > front end >  Find rows with fewer than X associations (including 0)
Find rows with fewer than X associations (including 0)

Time:08-20

I have students associated to schools, and want to find all schools that have five or fewer (including zero) students that have has_mohawk = false.

Here's an Activerecord query:

School.joins(:students)
      .group(:id)
      .having("count(students.id) < 5")
      .where(students: {has_mohawk: true})

This works for schools with 1 - 4 such students with mohawks, but omits schools where there are no such students!

I figured out a working solution and will post it. But I am interested in a more elegant solution.

Using Rails 5. I'm also curious whether Rails 6's missing could handle this?

CodePudding user response:

In addition to the first query, write another to find schools where no students have mohawks (works in Rails 5).

School.left_outer_joins(:students)
      .group(:id)
      .having("max(has_mohawk::Integer) = 0")

You might think from this popular answer that you could instead just write:

School.left_outer_joins(:students)
      .group(:id)
      .where.not(student: {has_mohawk: true})

But that will include (at least in Rails 5) schools where there is any student with a has_mohawk value of false, even if some students have a has_mohawk value of true.

Explanation of max(has_mohawk::Integer) = 0

It converts the boolean to an integer (1 for true, 0 for false). Schools with any true values will have a max of 1, and can thus be filtered out.

Similiar: SQL: Select records where ALL joined records satisfy some condition

CodePudding user response:

find all schools that have five or fewer (including zero) students that have has_mohawk = false.

Here is an optimized SQL solution. SQL is what it comes down to in any case. (ORMs like Active Record are limited in their capabilities.)

SELECT sc.*
FROM   schools sc
LEFT   JOIN (
   SELECT school_id
   FROM   students
   WHERE  has_mohawk = false
   GROUP  BY 1
   HAVING count(*) >= 5
   ) st ON st.school_id = sc.id
WHERE  st.school_id IS NULL;  -- "not disqualified"

While involving all rows, aggregate before joining. That's faster. This query takes the reverse approach by excluding schools with 5 or more qualifying students. The rest is your result - incl. schools with 0 qualifying students. See:

Any B-tree index on students (school_id) can support this query, but this partial multicolumn index would be perfect:

CREATE INDEX ON students (school_id) WHERE has_mohawk = false;

If there can be many students per school, this is faster:

SELECT sc.*
FROM   schools sc
JOIN   LATERAL (
   SELECT count(*) < 5 AS qualifies
   FROM  (
      SELECT  -- select list can be empty (cheapest)
      FROM   students st
      WHERE  st.school_id = sc.id
      AND    st.has_mohawk = false
      LIMIT  5  -- !
      ) st1
   ) st2 ON st2.qualifies;

The point is not to count all qualifying students, but stop looking once we found 5. Since the join to the LATERAL subquery with an aggregate function always returns a row (as opposed to the join in the first query), schools without qualifying students are kept in the loop, and we don't need the reverse approach.

About LATERAL:

  • Related