Home > Net >  MySQL - Slow performance for limit offset on json column
MySQL - Slow performance for limit offset on json column

Time:12-08

I'm seeing slow performance when using a LIMIT OFFSET on a SELECT statement for a table that includes a JSON column.

The following query takes 3 minutes to complete:

SELECT t.json_column
FROM table t
LIMIT 501
OFFSET 216204;

However, if I only select t.id, the query only takes a couple of milliseconds.

The slow performance is only visible when the JSON column is part of the SELECT statement, but I don't understand why, or what can I do to improve it.

To give more context, the MySql version I'm using is 5.7, the service is running on an AWS Aurora database, and the number of rows on the table is ~216.000

CodePudding user response:

When MySQL uses OFFSET, it cannot just skip to row 216204. MySQL indexes index by value, not by row number. So it must actually examine all those rows, which means reading the pages on which the rows are stored. Loading them from disk into RAM if necessary.

When you only reference t.id, it can scan along the index. I'll assume id is indexed (maybe even the primary key), and it's probably an integer (4 bytes) or bigint (8 bytes). Regardless, a lot of those entries can fit in a given InnoDB page (each page is of fixed size, 16KB each).

Whereas the JSON column is like TEXT or BLOB or VARCHAR, in that it is probably stored on additional pages, and it's much more bulky than a single integer. So it takes more pages. If the JSON documents are especially large, it may even take many pages per document.

So that's a lot of storage I/O to load all those JSON documents. It is many times the I/O work to load 216,000 JSON documents, compared to only referencing the primary key. It might be the case that the primary key pages are already cached in RAM anyway, so there is little or no I/O needed to read them.

It might be even worse with AWS Aurora than traditional MySQL, because Aurora uses distributed storage and replicated buffer pools, so there's additional overhead to both the I/O and the loading into RAM.

What can you do instead?

Stop using large values in your OFFSET. Search by values instead.

SELECT t.json_column
FROM table t
WHERE t.id >= 208000
LIMIT 501;

This is quick because the index helps to skip directly to the 208,000th row without examining all the preceding rows. It will only need to examine the rows starting with the row where id=208000 (or whatever value you search for). It stops examining rows once it finds enough for your LIMIT.

  • Related