I have a score table like this -
student_id | course_id | score |
---|---|---|
01 | 01 | 80 |
01 | 02 | 90 |
01 | 03 | 99 |
02 | 02 | 70 |
02 | 03 | 60 |
I want to generate a table showing a student's scores in one row, and if there is no score, show it as NULL. Like this -
student_id | 01 | 02 | 03 |
---|---|---|---|
01 | 80 | 90 | 99 |
02 | NULL | 70 | 60 |
The code that I came up with is -
SELECT
student_id,
(CASE WHEN course_id = '01' THEN score ELSE NULL END) AS '01',
(CASE WHEN course_id = '02' THEN score ELSE NULL END) AS '02',
(CASE WHEN course_id = '03' THEN score ELSE NULL END) AS '03'
FROM table;
And as expected I got three scores in separate rows. Like this -
student_id | 01 | 02 | 03 |
---|---|---|---|
01 | 80 | NULL | NULL |
01 | NULL | 90 | NULL |
01 | NULL | NULL | 99 |
02 | NULL | 70 | NULL |
02 | NULL | NULL | 60 |
The answer says I need to use MAX in front of the CASE statement. Also GROUP BY student_id at the end, or the student with NULL score value will not be shown in the result. Like this -
SELECT
student_id,
MAX(CASE WHEN course_id = '01' THEN score ELSE NULL END) AS '01',
MAX(CASE WHEN course_id = '02' THEN score ELSE NULL END) AS '02',
MAX(CASE WHEN course_id = '03' THEN score ELSE NULL END) AS '03',
FROM table
GROUP BY student_id;
My questions are:
- How does MAX combine three rows each with one score, into one row with three scores? What happened here?
- If I do not use GROUP BY student_id at the end, the result will exclude the student with NULL score value. Why? How does GROUP BY include the student back?
Any help would be appreciated! :)
CodePudding user response:
GROUP By
will produce a result set with one row for every student_id.
The MAX
will garantue that the higest Value will be in the row for everys student_id as you have one score and multiple NULLs.
To get all Studnent_ids you need to LEFT JOIN
all students, so ou have all.
SELECT
t1.student_id,
COALESCE(`01`,0) as '01',
COALESCE(`02`,0) as '02',
COALESCE(`03`,0) as '03'
FROM
(SELECT DISTINCT student_id FROM score) t1
LEFT JOIN (SELECT
student_id,
MAX(CASE WHEN course_id = '01' THEN score ELSE NULL END) AS '01',
MAX(CASE WHEN course_id = '02' THEN score ELSE NULL END) AS '02',
MAX(CASE WHEN course_id = '03' THEN score ELSE NULL END) AS '03',
FROM table
GROUP BY student_id) t2 ON t1.student_id = t2.student_id
CodePudding user response:
With your intermediate results, you need to return one row per student_id
which is where the group by
comes in, it collects all rows together for each unique value of the grouped columns.
On its own that wouldn't get you anywhere, you also need to tell SQL what to do with the other non-grouped columns, which is where max()
comes in.
This returns the maximum value per group (i.e. per each unique Student_Id) and automatically excludes NULL values; since there is only one value other than NULL in each column, the required value is returned. Note you can equally use other aggregate functions eg Min()
- the end result is the same.
Also note you can drop the else null
part as NULL is the default when not matched.
CodePudding user response:
This kind of operation is called "Pivoting", and it uses two steps:
- "enlarging the schema", by adding more columns to fit the maximum number of potential values that a column can have (in your specific case, the column is "score"). This operation just affects the amount of columns, while keeping the amount of rows same as the original table. Output for the single id
- "aggregating the rows", necessary step as long as we have already distributed the values of your column in different numbered columns. This operation just affects the amount of rows, while keeping the amount of columns same as the output of the previous step. Output for the single id
So how do you do each of these two steps?
In order to "enlarge the schema", you require CASE
statements (how many? the amount equals the maximum amount of values you want to extract for each id), which will extract only one value per row, for each id. Given that each new CASE
column will have one non-NULL and others NULL-only values (for each id), then the MAX
aggregation function comes in handy, and it will select the maximum value between one non-null value and all other NULL values (which results in selecting the non-null value). In order to limit the range of action of the MAX
aggregation function to each specific id, we use the GROUP BY
, which tells the DBMS to select the maximum value for each field(s) specified in its definition (for this task, only "student_id").
Hopefully this will answer both your doubts regarding how this way of solving pivoting problems work.