How can this query be optimized to avoid the full table scan described below?
I've got a slow query that's taking approximately 15 seconds to return.
Let's get this part out of the way - I've confirmed all indexes are there.
When I run EXPLAIN
, it shows that there is a FULL TABLE scan ran on the crosswalk
table (the index for fromQuestionCategoryJoinID
is not used, even if I attempt to force) - if I remove either of the fields and the OR
, the index is used and query completes in milliseconds.
SELECT c.id, c.name, GROUP_CONCAT(DISTINCT tags.externalDisplayID SEPARATOR ', ') AS tags
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id OR cw.fromQuestionCategoryJoinID = qcatjsub.id)
-- index used if I remove OR, eg.: LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id)
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
GROUP BY c.id
ORDER BY c.name, tags.externalDisplayID;
CodePudding user response:
Split the query into two queries for each part of the OR
. Then combine them with UNION
.
SELECT id, name, GROUP_CONCAT(DISTINCT externalDisplayID SEPARATOR ', ') AS tags
FROM (
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatj.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
UNION ALL
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatjsub.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
) AS x
GROUP BY x.id
ORDER BY x.name
Also, it doesn't make sense to include externalDisplayID
in ORDER BY
, because that will order by its value from a random row in the group. You could put ORDER BY externalDisplayID
in the GROUP_CONCAT()
arguments if that's what you want.
CodePudding user response:
There is a second inefficiency going on here. I call it "explode-implode". First a bunch of JOINs
(potentially) expand the number of rows in an intermediate table, then GROUP BY c.id
collapses the number of rows back to what you started with (one row of output per row of checkpoint
).
Before trying to help with that, please answer:
- Is
LEFT
really needed? - How many rows in each table? (Especially in
cw
) - Can you get rid of
DISTINCT
?
Barmar's answer can possibly be improved upon by delaying the JOINs to
qcj1and
tagsuntil after the
UNION`:
SELECT ...
FROM ( SELECT ...
FROM first few tables
UNION ALL
SELECT ...
FROM first few tables
) AS u
[LEFT] JOIN qcj1
[LEFT] JOIN tags
GROUP BY ...
ORDER BY ...
Another optimization (again building on Barmar's)
GROUP BY x.id
ORDER BY x.name
-->
GROUP BY x.name, x.id
ORDER BY x.name, x.id
When the items in GROUP BY
and ORDER BY
are the "same", they can be done in a single action, thereby saving (at least) a sort.
x.name, x.id
is deterministic, where as x.name
might put two rows with the same name
in a different order, depending (perhaps) on the phase of the moon.
These indexes may help:
qcheckj: INDEX(checklistID, questionID)
qcatj: INDEX(questionID, id)
qcatjsub: INDEX(parentQuestionID, id)
cw: INDEX(fromQuestionCategoryJoinID, toQuestionCategoryJoinID)
CodePudding user response:
Try it this way:
SELECT c.id, c.name, GROUP_CONCAT(DISTINCT tags.externalDisplayID
SEPARATOR ', ') AS tags
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID =
qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID =
qcatj.questionID
LEFT JOIN crosswalk cw on
(cw.fromQuestionCategoryJoinID,cw.fromQuestionCategoryJoinID) =
(qcatj.id,qcatjsub.id)
LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id)
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id =
cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
GROUP BY c.id
ORDER BY c.name, tags.externalDisplayID;
This removes the OR and achieve the same .