Home > Software engineering >  Avoid filesort in simple filtered ordered query
Avoid filesort in simple filtered ordered query

Time:04-06

I have a simple table:

CREATE TABLE `user_values` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `user_id` bigint NOT NULL,
  `value` varchar(100) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`,`id`),
  KEY `id` (`id`,`user_id`);
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

that I am trying to execute the following simple query:

select * from user_values where user_id in (20020, 20030) order by id desc;

I would fully expect this query to 100% use an index (either the (user_id, id) one or the (id, user_id) one) Yet, it turns out that's not the case:

explain select * from user_values where user_id in (20020, 20030); yields:

id select_type table partitions type key key_len ref rows filtered Extra
1 SIMPLE user_values NULL range user_id 8 NULL 9 100.00 Using index condition; Using filesort

Why is that the case? How can I avoid a filesort on this trivial query?

CodePudding user response:

You can't avoid the filesort in the query you show.

When you use a range predicate (for example, IN ( ) is a range predicate), and an index is used, the rows are read in index order. But there's no way for the MySQL query optimizer to guess that reading the rows in index order by user_id will guarantee they are also in id order. The two user_id values you are searching for are potentially scattered all over the table, in any order. Therefore MySQL must assume that once the matching rows are read, an extra step of sorting the result by id is necessary.

Here's an example of hypothetical data in which reading the rows by an index on user_id will not be in id order.

id user_id
1 20030
2 20020
3 20016
4 20030
5 20020

So when reading from an index on (user_id, id), the matching rows will be returned in the following order, sorted by user_id first, then by id:

id user_id
2 20020
5 20020
1 20030
4 20030

Clearly, the result is not in id order, so it needs to be sorted to satisfy the ORDER BY you requested.

The same kind of effect happens for other type of predicates, for example BETWEEN, or < or != or IS NOT NULL, etc. Every predicate except for = is a range predicate.

The only ways to avoid the filesort are to change the query in one of the following ways:

Omit the ORDER BY clause and accepting the results in whatever order the optimizer chooses to return them, which could be in id order, but only by coincidence.

Change the user_id IN (20020, 20030) to user_id = 20020, so there is only one matching user_id, and therefore reading the matching rows from the index will already be returned in the id order, and therefore the ORDER BY is a no-op. The optimizer recognizes when this is possible, and skips the filesort.

CodePudding user response:

MySQL will most likely use index for the query (unless the user_id's in the query covers most of the rows).

The "filesort" happens in memory (it's really not a filesort), and is used to sort the found rows based on the ORDER BY clause.

CodePudding user response:

You cannot avoid a "sort" in this case.

There were about 9 rows to sort, so it could not have taken long.

How long did the query take? Probably only a few milliseconds, so who cares?

"Filesort" does not necessarily mean that a "file" was involved. In many queries the sort is done in RAM.

Do you use id for anything other than to have a PRIMARY KEY on the table? If not, then this will help a small amount. (The speed-up won't be indicated in EXPLAIN.)

  PRIMARY KEY (`user_id`,`id`),  -- to avoid secondary lookups
  KEY `id` (`id`);   -- to keep auto_increment happy
  • Related