Home > Software engineering >  MySQL performance - cross join vs left join
MySQL performance - cross join vs left join

Time:11-15

I am wondering how MySQL (or its underlying engine) processes the queries. There are two set queries below (one uses left join and the other one uses cross join), which eventually will give the same result.

My question is, how come the processing time of the two sets of queries are similar?

What I expected is that the first set query will run quicker because the computer is dealing with left join so the size of the "table" won't be expanding, while the second set of queries makes the size of the "table" (what I assume is that the computer needs to get the result of the cross-join from multiple tables before it can go ahead and do the where clause) relatively larger.

select s.*, a.score as score_01, b.score as score_02
from student s
left join (select \* from sc where cid = '01') a using (sid)
left join (select \* from sc where cid = '02') b using (sid)
where a.score > b.score;
select s.*, a.score as score_01, b.score as score_02
from student s
,(select * from sc where cid = '01') a
,(select * from sc where cid = '02') b
where a.score > b.score and a.sid = b.sid and s.sid = a.sid;

I tried both sets of queries and expected the processing time for the first set query will be shorter, but it is not the case.

CodePudding user response:

Add this to sc:

INDEX(sid, cid, score)

Better yet, if you have a useless id on side replace it with

PRIMARY KEY(sid, cid)`

(Assuming that pair is Unique.)

With either of those fixes, I expect both of your queries run at similar speed, and faster than currently.

For further discussion, please provide SHOW CREATE TABLE.

Addressing some of the Comments

MySQL ignores the keywords INNER, OUTER, and CROSS. So, it up to the WHERE to figure whether it is "inner" or "outer".

MySQL throws the ON and WHERE conditions together (except when it matters for LEFT), then decides what is used for filtering (WHERE) so it may be able to do that first. Then other conditions (which belonged in ON) help it get to the 'next' table.

So... Please use ON to say how the tables are related; use WHERE for filtering. (And don't use the old comma-join.)

That is, MySQL will [usually] look at one table at a time, doing a "Nested Loop Join" (NLJ) to get to the next.

There are many possible ways to evaluate a JOIN; MySQL ponders which one might be best, then uses that.

The order of non-LEFT JOINs does not matter, nor does the order of expressions AND'd together in WHERE.

In some situations, a HAVING expression can (and is) moved to the WHERE clause.

Although FROM comes before WHERE, the two get somewhat tangled up together. But, in general, the clauses are required to be in a certain order, and that order is logically the order that things have to happen in.

It is up to the Optimizer to combine steps. For example

WHERE a = 1
ORDER BY b

and the table has INDEX(a,b) -- The index will be used to do both, essentially at the same time. Ditto for

SELECT a, MAX(b)
    ...
    GROUP BY a
    ORDER BY a

can hop through the BTree index on (a,b) and deliver the results without an extra sort pass for either the GROUP BY or ORDER BY.

SELECT x is executed after WHERE y = 'abc' -- Well, in some sense it is. But if you have INDEX(y,x), the Optimizer is smart enough to grab the x values while it is performing the WHERE.

When a WHERE references more than one table of a JOIN, the Optimizer has a quandary. Which table should it start its NLJ with? It has some statistics to help make the decision, but it does not always get it right. It will usually

  1. filter on one of the tables
  2. NLJ to get to the next table, meanwhile throwing in any WHERE clauses for that table in with the ON clause.
  3. Repeat for other tables.

When there is both a WHERE and an ORDER BY, the Optimizer will usually filter filter, then sort. But sometimes (not always correctly) it will decide to use an index for the ORDER BY (thereby eliminating the sort) and filter as it reads the table. LIMIT, which is logically done last further muddies the decision.

MySQL does not have FULL OUTER JOIN. It can be simulated with two JOIN and a UNION. (It is only very rarely needed.)

  • Related