I think my question is best put analogously through the example below.
Suppose I have the following table that contains information about classes and students in a school.
class_id | student_id |
---|---|
1 | 3 |
1 | 6 |
1 | 2 |
1 | 6 |
2 | 4 |
2 | 7 |
3 | 6 |
3 | 3 |
3 | 2 |
3 | 5 |
3 | 1 |
I would like to retrieve all students and classes that contain the students with ID's 3 and 6, i.e. my result would be:
class_id | student_id |
---|---|
1 | 3 |
1 | 6 |
1 | 2 |
1 | 6 |
3 | 6 |
3 | 3 |
3 | 2 |
3 | 5 |
3 | 1 |
This can be achieved with the following query...
SELECT
*
FROM
table_name
WHERE
class_id IN (SELECT
class_id
FROM
table_name
WHERE
student_id IN (3 , 6)
GROUP BY class_id
HAVING COUNT(DISTINCT student_id) = 2);
I was wondering whether I could achieve the above without the use of a subquery? Speed is of the upmost importance and so I would like to minimise the duration of this select query.
CodePudding user response:
Attempting to avoid sub-queries can lead you down the wrong path. You can indeed write bad queries with sub-queries, but you can write bad queries without them too, as well as perfectly good queries with them.
In your case, the sub-query is being used as a separate scope to find the list of classes of interest. Doing that in the same scope as you also get those classes' students is likely to be messy and inefficient.
For example, an alternative to using IN()
still requires a sub-query...
SELECT
student.*
FROM
table_name AS student
INNER JOIN
(
SELECT
class_id
FROM
table_name
WHERE
student_id IN (3 , 6)
GROUP BY
class_id
HAVING
COUNT(DISTINCT student_id) = 2
)
AS class
ON class.class_id = student.class_id
That, however, may yield a similar (or even identical) plan to the query you already have.
The closest I can get to "without a sub-query" is to use a nested query, which you can argue is still a sub-query but is easier for the optimiser to expand.
(Uses MySQL 8 )
SELECT
*
FROM
(
SELECT
*,
SUM(CASE WHEN student_id IN (3,6) THEN 1 END)
OVER (PARTITION BY class_id)
AS target_member_count
FROM
table_name
)
scanned
WHERE
target_member_count = 2
I would expect that to be slower in the case that most classes don't contain students 3 and/or 6. But you could try it.