Table:
CREATE TABLE `hosts` (
`id` int NOT NULL AUTO_INCREMENT,
`host_name` varchar(255) DEFAULT NULL,
`ip` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unq_ip_host` (`ip`,`host_name`),
KEY `idx_host` (`host_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
MySQL version: mysql:5.7.36.
Here's the situation, in my project, the IP can match several hosts, I wanna get the result that matches (ip_a,host_a),(ip_b,host_b), so I use following sql:
select * from hosts h
where (h.ip,h.host_name) in (("198.168.214.166","la-worker-01")... ...)
Actually the target set is so large, around 3000 rows, this sql will scan the complete table, and it costs 24s, I tried to use FORCE INDEX (unq_ip_host)
, but it still cost a lot of time, almost equals scanning the whole table. Explain result:
type: range, key: unq_ip_host, key_len: 768,rows: 8952, filtered: 50.0, extra: Using where
Then I tried to use FORCE INDEX (idx_host)
, and this is the result of sql explain:
type: ALL, key: null, key_len: null,rows: 284000, filtered: 50.0, extra: Using where
type
is ALL
which means it won't use the index, this is weird, when I executed the sql, FORCE INDEX (idx_host)
costs about 4.1s, but FORCE INDEX (idx_ip_host)
costs 24s, this is weird.
select count(distinct(ip)) from hosts; // result: 900000
select count(distinct(host_name)) from hosts; // result: 890000
I can improve the sql but I'm puzzled as to why the two applications of FORCE INDEX
provide different results.
Question:
- Why doesn't MySQL use the index, when I use
FORCE INDEX (idx_host)
? - Why is scanning the entire table(
FORCE INDEX (idx_host)
) faster than using index(FORCE INDEX (idx_ip_host)
)?
CodePudding user response:
As per MySQL's docs when using FORCE INDEX
a table scan is used if there is no way to use one of the named indexes to find rows in the table.
In this case when you use FORCE INDEX (idx_host)
it can not find all the required information using this index and defaults to a FULL TABLE SCAN
, the table scan query needs 4 seconds in your case.
When you use FORCE INDEX (idx_ip_host)
it is able to get all it's required information (indexes have information regarding their PK), however due to the way your data/query is structured, using this index is inefficient hence the 24 seconds execution time.
You already mentioned that you wish to not touch the query, but it will probably be better to make a join/temporary table for this.
CodePudding user response:
Do you use id
for anything? Some queries will run faster with PRIMARY KEY(IP, host_name)
One way to get more insight into what is going on is:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
That should show numbers like 3000. But it sounds like it may be giving numbers like 1M.
Also, please provide EXPLAIN FORMAT=JSON SELECT ...