Home > other >  How to select all instance of an ID from a reference table based on whether it contains at least one
How to select all instance of an ID from a reference table based on whether it contains at least one

Time:04-03

I think my question is best put analogously through the example below.

Suppose I have the following table that contains information about classes and students in a school.

class_id student_id
1 3
1 6
1 2
1 6
2 4
2 7
3 6
3 3
3 2
3 5
3 1

I would like to retrieve all students and classes that contain the students with ID's 3 and 6, i.e. my result would be:

class_id student_id
1 3
1 6
1 2
1 6
3 6
3 3
3 2
3 5
3 1

This can be achieved with the following query...

SELECT 
    *
FROM
    table_name
WHERE
    class_id IN (SELECT 
            class_id
        FROM
            table_name
        WHERE
            student_id IN (3 , 6)
        GROUP BY class_id
        HAVING COUNT(DISTINCT student_id) = 2);

I was wondering whether I could achieve the above without the use of a subquery? Speed is of the upmost importance and so I would like to minimise the duration of this select query.

CodePudding user response:

Attempting to avoid sub-queries can lead you down the wrong path. You can indeed write bad queries with sub-queries, but you can write bad queries without them too, as well as perfectly good queries with them.

In your case, the sub-query is being used as a separate scope to find the list of classes of interest. Doing that in the same scope as you also get those classes' students is likely to be messy and inefficient.

For example, an alternative to using IN() still requires a sub-query...

SELECT 
  student.*
FROM
  table_name   AS student
INNER JOIN
(
  SELECT 
    class_id
  FROM
    table_name
  WHERE
    student_id IN (3 , 6)
  GROUP BY
    class_id
  HAVING
    COUNT(DISTINCT student_id) = 2
)
  AS class
    ON class.class_id = student.class_id

That, however, may yield a similar (or even identical) plan to the query you already have.

The closest I can get to "without a sub-query" is to use a nested query, which you can argue is still a sub-query but is easier for the optimiser to expand.

(Uses MySQL 8 )

SELECT
  *
FROM
(
  SELECT
    *,
    SUM(CASE WHEN student_id IN (3,6) THEN 1 END)
      OVER (PARTITION BY class_id)
        AS target_member_count
  FROM
    table_name
)
  scanned
WHERE
  target_member_count = 2

I would expect that to be slower in the case that most classes don't contain students 3 and/or 6. But you could try it.

  • Related