I am new to MySQL, and I have a task to do right now where I have three tables:
- students(id,name)
- courses(id,name)
- grades(id, student_id (FK), course_id(FK), grade)
I am supposed to
get the name of the most popular course (the one where the most students are enrolled) and if there is a tie, get the course that's lexicographically the smallest.
I tried several queries, but they are not 'efficient enough'
SELECT course.name FROM (
SELECT CI ,MAX(Total) FROM
(
SELECT course_id as CI,COUNT(*) AS Total
FROM grades
GROUP BY course_id ASC
) AS Results
) AS x
INNER JOIN courses ON x.CI = courses.id
And
SELECT courses.name FROM (
SELECT course_id, COUNT(*) AS how_many
FROM grades
GROUP BY course_id ASC
HAVING how_many = (
SELECT COUNT(*) AS how_many
FROM grades
GROUP BY course_id
ORDER BY how_many DESC
LIMIT 1
)
LIMIT 1
) AS X
JOIN courses ON X.course_id=courses.id
Is there any more efficient query?
CodePudding user response:
Both your query attempts look logically incorrect to me. You should be joining courses
to grades
to obtain the number of students enrolled in each course. Regarding efficiency, the RANK
analytic function is one of the most performant options, assuming you are running MySQL 8 :
WITH cte AS (
SELECT c.id, c.name, RANK() OVER (ORDER BY COUNT(*) DESC, c.name) rnk
FROM courses c
INNER JOIN grades g ON g.course_id = c.id
GROUP BY c.id, c.name
)
SELECT id, name
FROM cte
WHERE rnk = 1;
On earlier versions of MySQL, we can use a LIMIT
query:
SELECT c.id, c.name
FROM courses c
INNER JOIN grades g ON g.course_id = c.id
GROUP BY c.id, c.name
ORDER BY COUNT(*) DESC, c.name
LIMIT 1;
CodePudding user response:
You can use the ORDER BY
clause with the LIMIT
clause to get what you need, without aggregating twice:
WITH enrollments AS (
SELECT course_id, COUNT(DISTINCT student_id) AS num_enrollments
FROM grades
GROUP BY course_id
)
SELECT *
FROM enrollments e
INNER JOIN courses c
ON e.course_id = c.id
ORDER BY e.num_enrollments DESC, c.name ASC
LIMIT 1
The subquery will get you the enrollments by aggregating on the students, then it is joined with the courses to use the course name.
Data is then ordered by:
- number of enrollments descendent
- course name ascendent
and only the first row is considered.