Mysql join optimize where clause-CodePudding

There are two tables in Mysql5.7, and each one has 100,000 records. And each one contains data like this:

id  name
-----------
1   name_1  
2   name_2  
3   name_3  
4   name_4  
5   name_5
...

The ddl is:

CREATE TABLE `table_a` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE `table_b` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Now I execute following two queries to see whether the latter will be better.

select SQL_NO_CACHE * 
from table_a a inner 
join table_b b on a.name = b.name 
where a.id between 50000 and 50100;

select SQL_NO_CACHE * 
from (
    select * 
    from table_a 
    where id between 50000 and 50100
) a 
inner join table_b b on a.name = b.name;

I think that in the former query, it would iterate up to 100,000 * 100,000 times and then filter the result by where clause; in the latter query, it would first filter the table_a to get 100 intermediate result and then iterate up to 100 * 100,000 times to get final result. So the former would be much faster than the latter.
But the result is that both query spends 1.5 second. And by using explain statement, I can't find any substantial differences
Does the mysql optimize the former query so that it executes like the latter?

CodePudding user response：

For INNER JOIN, ON and WHERE are optimized the same. For LEFT/RIGHT JOIN, the semantics are different, so the optimization is different. (Meanwhile, please use ON for stating the relationship and WHERE for filtering -- it helps humans in understanding the query.)

Both queries can start by fetching 100 rows from a because of a.id between 50000 and 50100, then reach into the other table 100 time. But how it has to do a table scan because of the lack of any useful index. So 100 x 100,000 operations. ("Nested Loop Join" or "NLJ")

The solution to the slowness is to add

INDEX(name)

Add it at least to b. Or, if this is really a lookup table for making "names" to "ids", then UNIQUE(name). With either index, the work should be down to 100 x 100.

Another technique for analyzing queries is

FLUSH STATUS;
SELECT ...
SHOW VARIABLES LIKE 'Handler%';

It counts the actual number of rows (data or index) touched. 100,000 (or multiples of such) indicate a full table/index scan(s) in your case.

More: Index Cookbook

CodePudding user response：

Joins are always faster than sub-queries, so try to use joins instead of sub-queries wherever you can to speed up the process. Whereas in this case, both the queries are equivalent.

Another way to optimize the query would be using partitions. When using partitions, mysql will directly go to the partition according to your specified query which will reduce the time spent on other unrelated records.