I am struggling to figure this out, I have a simple MySQL table (test_scores) that is three columns, id(students), date and test grade.
id | date | grade |
---|---|---|
1 | 10/20/22 | A |
2 | 10/20/22 | B |
1 | 11/1/22 | F |
2 | 11/1/22 | C |
3 | 11/1/22 | A |
1 | 12/1/22 | C |
I want to get the grade a student gets on the next test after they get an F. In this example I would like to get the following output.
id | grade | next_grade |
---|---|---|
1 | F | C |
I have tried:
SELECT id, grade, grade as next_grade FROM test_scores WHERE grade = F;
I'm just learning advanced joins and nested queries, and I just can't figure out how to get the next grade. Any help would be appreciated.
CodePudding user response:
SELECT t.id, t.grade, t2.grade AS next_grade
FROM test_scores t
LEFT JOIN test_scores t2 ON UNIX_TIMESTAMP(t2.date) >= UNIX_TIMESTAMP(t.date) AND t2.id <> t.id
WHERE t.grade='F'
ORDER BY t2.date,t2.id
LIMIT 1
The above will JOIN
the test_grades with itself, to fetch other rows with greater or equal dates but different grade.
Then i have done an ORDER BY
and LIMIT 1
, to get only the first such row, not all, and it must be the one with minimum date and id among those rows, because you want the nearest one in the future (and the one with minimum id, if there are multiple ones with same date).
Also i have used LEFT JOIN
because the next date may not exist, so in such case you would get a null next_grade
.
If you want it to just not return any records in that case, use a regular JOIN
Let me know if it was not clear.
P.S.
UNIX_TIMESTAMP
will convert the date to its unix timestamp, and it may be not necessary, you can compare the date columns directly if they are of DATE
type and not strings (VARCHAR
or TEXT
variants).
The UNIX_TIMESTAMP
approach would still work in case the date column is not a DATE
but a VARCHAR
or TEXT
. So i used it to stay safe.
CodePudding user response:
There is plenty of room for optimization.
First you have to guess, that each student can't have many graduation, but number of student is big. Also not many student have F.
So. First you have do is get all students ever get F grade
select id, `date` from test_scores where grade='F'
After that you need try to find another grade for that students.
select a.id,a.grade,b.grade from
(select id,`date`,grade from test_scores where grade = 'F') a
left join
test_scores b on
a.id = b.id and a.date < b.date
where b.grade is not null
This one give all scores which was after F. i.e will give two rows if you have F,A,B. But this should not happens, no?
You can replace left join with lookup if you think you can have more than one row, but it will be slightly less speed.
select * from
(select id, grade,
(select grade from test_scores AS b
where b.id=a.id and a.date < b.date
order by b.date limit 1
) AS next_grade
from test_scores AS b
where grade = 'F'
) AS result
where result.next_grade is not null
Recomended indexes - test_score(id) - I hope you have it. I mean you HAVE to make index on primary column in most cases.
Maybe can be useful test_score(id,date) - but this has to be checked on real data. Likely will be useful from 10m tables, but very low chance. just because every single student can't have really many graduations results in his life.
ps. this will work only on date or timestamp column. If your column is text replace join to
a.id = b.id and
DATE_FORMAT(a.date, '%M/%d/%Y') < DATE_FORMAT(b.date, '%M/%d/%Y')
p.p.s. if your tables are really big you may consider not do subquery, but move where from first subquery outside and just do left join. Which convert this SQL from index lookup type to index match type. It may or may not speed up it.