I am wondering how MySQL (or its underlying engine) processes the queries. There are two set queries below (one uses left join and the other one uses cross join), which eventually will give the same result.
My question is, how come the processing time of the two sets of queries are similar?
What I expected is that the first set query will run quicker because the computer is dealing with left join so the size of the "table" won't be expanding, while the second set of queries makes the size of the "table" (what I assume is that the computer needs to get the result of the cross-join from multiple tables before it can go ahead and do the where clause) relatively larger.
select s.*, a.score as score_01, b.score as score_02
from student s
left join (select \* from sc where cid = '01') a using (sid)
left join (select \* from sc where cid = '02') b using (sid)
where a.score > b.score;
select s.*, a.score as score_01, b.score as score_02
from student s
,(select * from sc where cid = '01') a
,(select * from sc where cid = '02') b
where a.score > b.score and a.sid = b.sid and s.sid = a.sid;
I tried both sets of queries and expected the processing time for the first set query will be shorter, but it is not the case.
CodePudding user response:
Add this to sc
:
INDEX(sid, cid, score)
Better yet, if you have a useless id
on side
replace it with
PRIMARY KEY(sid, cid)`
(Assuming that pair is Unique.)
With either of those fixes, I expect both of your queries run at similar speed, and faster than currently.
For further discussion, please provide SHOW CREATE TABLE
.
Addressing some of the Comments
MySQL ignores the keywords INNER
, OUTER
, and CROSS
. So, it up to the WHERE
to figure whether it is "inner" or "outer".
MySQL throws the ON
and WHERE
conditions together (except when it matters for LEFT
), then decides what is used for filtering (WHERE
) so it may be able to do that first. Then other conditions (which belonged in ON
) help it get to the 'next' table.
So... Please use ON
to say how the tables are related; use WHERE
for filtering. (And don't use the old comma-join.)
That is, MySQL will [usually] look at one table at a time, doing a "Nested Loop Join" (NLJ) to get to the next.
There are many possible ways to evaluate a JOIN
; MySQL ponders which one might be best, then uses that.
The order of non-LEFT JOINs
does not matter, nor does the order of expressions AND'd
together in WHERE
.
In some situations, a HAVING
expression can (and is) moved to the WHERE
clause.
Although FROM
comes before WHERE
, the two get somewhat tangled up together. But, in general, the clauses are required to be in a certain order, and that order is logically the order that things have to happen in.
It is up to the Optimizer to combine steps. For example
WHERE a = 1
ORDER BY b
and the table has INDEX(a,b)
-- The index will be used to do both, essentially at the same time. Ditto for
SELECT a, MAX(b)
...
GROUP BY a
ORDER BY a
can hop through the BTree index on (a,b) and deliver the results without an extra sort pass for either the GROUP BY
or ORDER BY
.
SELECT x
is executed after WHERE y = 'abc'
-- Well, in some sense it is. But if you have INDEX(y,x)
, the Optimizer is smart enough to grab the x
values while it is performing the WHERE
.
When a WHERE
references more than one table of a JOIN
, the Optimizer has a quandary. Which table should it start its NLJ with? It has some statistics to help make the decision, but it does not always get it right. It will usually
- filter on one of the tables
- NLJ to get to the next table, meanwhile throwing in any
WHERE
clauses for that table in with theON
clause. - Repeat for other tables.
When there is both a WHERE
and an ORDER BY
, the Optimizer will usually filter filter, then sort. But sometimes (not always correctly) it will decide to use an index for the ORDER BY
(thereby eliminating the sort) and filter as it reads the table. LIMIT
, which is logically done last further muddies the decision.
MySQL does not have FULL OUTER JOIN
. It can be simulated with two JOIN
and a UNION
. (It is only very rarely needed.)