MySQL statement on a school system I have created some years back was working fine but now takes close to 30 seconds to pull what appears to me a simple statement but cannot figure out how to improve it. I am wondering if anyone can help me re-write this statement for a faster response. The statement is:
SELECT es.*,c.mainsubarea AS subject, b.name,b.email,GROUP_CONCAT(doc.document_file SEPARATOR "|") document_file
FROM usersubinfo es
LEFT JOIN userinfo b ON (es.uid=b.uid)
LEFT JOIN lkptsubjectarea c ON (es.mainsubjectarea=c.id)
LEFT JOIN lkptdeliverytime d ON (es.deliverytime = d.id)
LEFT JOIN documents doc ON (es.id = doc.order_id)
WHERE es.id AND es.is_active='Yes'
GROUP BY es.id
ORDER BY es.joindate
DESC LIMIT 0,25
See screenshot from Explain statement on phpMyAdmin
CodePudding user response:
Add Indexes: These may help:
b: INDEX(uid, name, email)
doc: INDEX(order_id, document_file)
Remove LEFT: Is there a reason for LEFT JOIN
instead of JOIN
? I think not. See if you get the same results without LEFTs
.
Remove bogus test: Why WHERE es.id
? If id
is the PRIMARY KEY
of es
, that test will always be true.
Improve GROUP ORDER: Change
GROUP BY es.id
ORDER BY es.joindate DESC
LIMIT 0,25
-->
GROUP BY es.joindate, es.id
ORDER BY es.joindate DESC, es.id DESC
LIMIT 0,25
That avoids two passes over the data -- one for the GROUPing, and another for the ORDERing. Meanwhile, I assume that my grouping and ordering is "just as good".
Turn inside-out: That brings up another issue, one I call "explode-implode". That's where you Join together a lot of rows, only to get rid of most of them. So...
Start by finding the 25 ids desired with as little effort as possible:
SELECT id
FROM usersubinfo
WHERE is_active = 'Yes'
GROUP BY joindate, id
ORDER BY joindate DESC, id DESC
LIMIT 0,25
And include that as a 'derived' table in the rest:
SELECT es.*, c.mainsubarea AS subject,
b.name, b.email,
GROUP_CONCAT(doc.document_file SEPARATOR "|") document_file
FROM ( put the above Select here
) AS ids
JOIN usersubinfo AS es USING(id)
JOIN userinfo b ON (es.uid=b.uid)
JOIN lkptsubjectarea c ON (es.mainsubjectarea=c.id)
JOIN lkptdeliverytime d ON (es.deliverytime = d.id)
JOIN documents doc ON (es.id = doc.order_id)
ORDER BY joindate DESC, id DESC; -- yes, repeat this
This is likely to be faster because the tables other than usersubinfo
will be touched only 25 times.
(I think this will happen to avoid the "only_full_group_by" issue to which Parfait refers.)
CodePudding user response:
First and foremost, consider writing a valid ANSI SQL aggregate query with changes to your GROUP BY
and SELECT
clauses. Currently, your query includes only one column in GROUP BY
but all columns from usersubinfo
table with SELECT es.*
in addition to other non-aggregated columns. You even order by a column not in GROUP BY
.
Such a query runs against the SQL standard and will fail in most RDBMS's but allowable in MySQL due to its ONLY_FULL_GROUP_BY
mode turned off which dangerously allows:
server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want
Since you have an aggregate function, GROUP_CONCAT
, all other non-aggregated columns should be placed in GROUP BY
clause. If you need to add columns in SELECT
add it also to GROUP BY
. Also, you may have a redundant LEFT JOIN
which serves no purpose for other JOIN
s or columns for SELECT
.
SELECT es.id,
es.joindate,
sa.mainsubarea AS subject,
i.name,
i.email,
GROUP_CONCAT(doc.document_file SEPARATOR "|") document_file
FROM usersubinfo es
LEFT JOIN userinfo i ON (es.uid = i.uid)
LEFT JOIN lkptsubjectarea sa ON (es.mainsubjectarea = sa.id)
-- LEFT JOIN lkptdeliverytime dlv ON (es.deliverytime = dlv.id) -- POSSIBLY REDUNDANT
LEFT JOIN documents doc ON (es.id = doc.order_id)
WHERE es.id IS NOT NULL
AND es.is_active = 'Yes'
GROUP BY es.id,
es.joindate,
sa.mainsubarea,
i.name,
i.email,
ORDER BY es.joindate DESC
LIMIT 0
OFFSET 25
Additionally, by avoiding SELECT *
you avoid bringing in unneeded even newer columns, allowing indexes to run effectively over large table scans, and avoid sending large amount of content over network. See Why is SELECT * considered harmful?