Let's say I have a MySQL table defined like this:
CREATE TABLE big_table (
primary_1 varbinary(1536),
primary_2 varbinary(1536),
ts timestamp(6),
...
PRIMARY KEY (primary_1, primary_2),
KEY ts_idx (ts),
)
I would like to implement efficient pagination (seeking pagination) as described in this blog post https://use-the-index-luke.com/sql/partial-results/top-n-queries
If I only use the first part of the primary key, the pipelined execution works fast and as expected:
mysql> explain select * from big_table order by ts, primary_1 limit 5;
---- ------------- ------------------------------------- ------------ ------- --------------- -------- --------- ------ ------ ---------- -------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------------------------------------- ------------ ------- --------------- -------- --------- ------ ------ ---------- -------
| 1 | SIMPLE | big_table | NULL | index | NULL | ts_idx | 7 | NULL | 5 | 100.00 | NULL |
---- ------------- ------------------------------------- ------------ ------- --------------- -------- --------- ------ ------ ---------- -------
However, if I add the second part of the primary key to the ORDER BY clause everything slows down and filesort starts being used:
mysql> explain select * from big_table order by ts, primary_1, primary_2 limit 5;
---- ------------- ------------------------------------- ------------ ------ --------------- ------ --------- ------ --------- ---------- ----------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------------------------------------- ------------ ------ --------------- ------ --------- ------ --------- ---------- ----------------
| 1 | SIMPLE | big_table | NULL | ALL | NULL | NULL | NULL | NULL | 6388499 | 100.00 | Using filesort |
---- ------------- ------------------------------------- ------------ ------ --------------- ------ --------- ------ --------- ---------- ----------------
Is it not possible to do this pipelined execution and ordering on composite primary? Or should the query be written in some special way?
CodePudding user response:
Without prior knowledge about how MySQL works internally, there is no reason to assume that an index on just ts
can be used to order by ts, primary_1
without doing an additonal (file)sort on primary_1
. Imagine e.g. the edge case that all values for ts
are the same - the index will just give you all rows, which you then have to sort by primary_1
.
Nevertheless, MySQL can make use of some additional information: InnoDB stores secondary indexes in a way that includes the primary key columns (to be able to find the actual row in the table). Since that information is there anyway, MySQL can just make use of it - and it does, by using Index Extensions. This basically extends the index ts
to an index ts, primary_1, primary_2
.
So this technical trick allows you to use the index on ts
to order by ts, primary_1, primary_2
. But since there is always a "but", here is the "but":
Use of index extensions by the optimizer is subject to the usual limits on the number of key parts in an index (16) and the maximum key length (3072 bytes).
The index on ts, primary_1, primary_2
would be longer than 3072 bytes. You can e.g. also not create such an index manually. So this extension doesn't work anymore, and MySQL falls back to treating the index on ts
like an index on just ts
.
So why does it work for order by ts, primary_1
? Well, even if, for those technical reasons, MySQL cannot create an internal index on ts, primary_1, primary_2
, it could at least do it for ts, primary_1
without running into technical problems. MySQL actually doesn't do that though - but the MariaDB developers implemented this trick, so I assume you are actually using MariaDB. Nevertheless, the length restriction of 3072 still applies, so your order by both primary columns still won't work.
What can you do?
If you can shorten your primary keys a bit, the index extension would work again. Primary keys that long (and of that type) are uncommon and unpractical anyway (not only for this use case), so maybe you can find a different primary key for your table.
If that is not an option, you may be able to utilize some prior knowledge about your data distribution, e.g. if you know that at most 10 values for ts
can be the same, you can first pick the first n 10 rows (using the index), then order only those by the primary keys. If you usually only show the first few pages, this might speed up your specific situation. But you may want to ask a separate question for it with specific details.