How to SELECT multiple rows from same column optimally-CodePudding

[Student] table

pkid	FirstName
01	John
02	Mary
03	Kate

[CourseGrades] table

SID	courseName	Grade
01	maths	A
01	english	B
01	science	C
01	geography	D
02	english	C
02	science	B
02	geography	C
03	maths	B
03	english	E
03	geography	D

[CourseGrades] is the master grade table where grades are just jotted down in no particular order and does not require any grades for students to be present.

Desired output:

FirstName	MathsGrade	ScienceGrade
John	A	C
Mary	NULL	B
Kate	B	NULL

I'm currently using the query below to get my desired output.

SELECT 
    s.FirstName,
    cg1.Grade AS 'MathsGrade',
    cg2.Grade AS 'ScienceGrade',
FROM 
    [Student] s
LEFT JOIN 
    [CourseGrades] cg1 ON s.pkid = cg1.SID AND cg1.courseName = 'maths'
LEFT JOIN 
    [CourseGrades] cg2 ON s.pkid = cg2.SID AND cg2.courseName = 'science'

Is there a better (performance wise) way to select only maths and science grades for all students?

I need the output to show blank if the the student doesn't have the grade for that course. Which is why I'm using LEFT JOIN.

CodePudding user response：

You can aggregate and use CASE to weed out any grade that doesn't match the course. This assumes that each student only has one grade. If that's not the case, then you can't just use MAX(). There is FIRST_VALUE, but it is not going to work with nulls, and it's a little more verbose anyway. See this fiddle (SQL Server 2019, you didn't specify):

https://dbfiddle.uk/?https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=05cf04d9a81d4660d7ce35a20f8c810f

SELECT s.FirstName,
  max(case when c.courseName = 'maths' then c.Grade else null end) as Maths,
  max(case when c.courseName = 'science' then c.Grade else null end) as Science
FROM Student s
LEFT JOIN CourseGrades c on s.pkid = c.sid
WHERE c.CourseName in ('maths', 'science')
GROUP BY s.FirstName;

You can remove the IN ('maths', 'science') clause. It doesn't change the result. It might make the query more efficient, but I don't think you're going to have enough data to worry about it. So unless query optimization is part of the assignment, I think I'd just go with whatever was easier to write.

I am not sure that this approach is better than the one you have. The left joins are cleaner reading than the case, even if you have to do them twice. As far as performance, I'm going to switch the engine to mySQL for just a minute so I can use the json formatted explain there:

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=457e42be15ae17b90a8a287614726605

You can see that the aggregate function without the WHERE clause is about 50% more costly (query_cost 3.80) than the one with it (2.10), but the two LEFT JOINs are about 17x more costly (34.05).

But again, there's not enough data to worry about performance. I would stick with what's easiest to read, understand and maintain.

CodePudding user response：

To achieve more performance you need to set those other columns as indexes as well and also optimise the query (this is specific to the database you are using, e.g. ANALYSE ...)