Home > database >  MySQL uses wrong index
MySQL uses wrong index

Time:09-29

There is a table with these columns:

Name Type
Id datetime(6)
DateTime int
CompanyId int (FK)
IsExcluded tinyint(1)

There are 2 BTREE indexes:

  1. CompanyId
  2. ComapnyId,DateTime,IsExcluded

Following select is using the first index with only one column and this select takes 2.3sec. When I force the second index select takes 0.015sec. Also when I decrease DateTime range by one day MySQL is using the second index without forcing.

select IsExcluded,DateTime,CompanyId FROM table where 
IsExcluded = 0 and 
DateTime >= '2022-06-02' and 
DateTime < '2022-09-22' and 
CompanyId = 1;

I understand that if more than ~20%-30% of rows are selected, MySQL could decide to ignore index, but I don't understand why MySQL is selecting different obviously not the most suitable index.

Is there any way how to setup or "learn" MySql that second index for this query is the most suitable (without inserting anything else to query) ?

CodePudding user response:

Is there any way how to setup or "learn" MySql that second index for this query is the most suitable (without inserting anything else to query) ?

You can in theory tweak the index statistics manually to influence the optimizer's choice. See https://dev.mysql.com/blog-archive/histogram-statistics-in-mysql/

But frankly, I know of no developers who use this feature. It's just too difficult to figure out how to use it, and any custom index statistics you choose would be likely to get out of date very quickly.

What is more common is to use index hints to tell the optimizer to consider only a specific index, or to ignore other indexes. See https://dev.mysql.com/doc/refman/8.0/en/index-hints.html

Example:

SELECT IsExcluded, DateTime, CompanyId 
FROM table USE INDEX (myindex_with_three_columns)
WHERE IsExcluded = 0 
 AND DateTime >= '2022-06-02' 
 AND DateTime < '2022-09-22' 
 AND CompanyId = 1;

(I'm guessing at the name of your 3-column index.)

I would guess the reason your index is not used is that the DateTime portion is not helping enough to justify the wider index (i.e. your date range covers too many rows), and the IsExcluded portion is not used at all, because it follows a column that is used in a range condition. So the optimizer chooses an index that's more compact, so it can load the index into RAM in fewer I/O reads.

What's important to understand about column order in the index is that the columns involved in equality conditions should be to the left. Then you can have one column that is used in an inequality or range condition, and any subsequent columns in the index are not used for searching or sorting.

In your case you have an index on (CompanyId,DateTime,IsExcluded), but the second column is used in a range condition, so the third column isn't used for the search. At best it could try to use index condition pushdown, but that's not as good as really narrowing down the search.

The better index would be to order the columns so the DateTime column is last. Either of the columns used for equality may be first, but they both have to be before the DateTime column. Either of these orders would be better in this case:

(CompanyId,IsExcluded,DateTime)
(IsExcluded,CompanyId,DateTime)

CodePudding user response:

Updated

As per my understanding the order of keys does matter. Check also related article

MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.

More info at MySQL doc

CodePudding user response:

Given a 'smaller' INDEX(a) and a 'larger' INDEX(a,b), the Optimizer will often use the smaller index even though the larger one would do more filtering.

This is a recurring problem, I have seen it in many situations; there needs to be a bug report encouraging them to fix the Optimizer.

The workaround is to DROP the smaller index.

Furthermore, as Bill points out, your larger index is not optimal. In general, move any range column to the end of the index.

There's another point here: For that SELECT your 3-column index should have been picked because it is "covering". That means that all of the columns anywhere in the SELECT are in the single INDEX (in any order). (Bill's recommendations are both covering and faster.)

  • Related