SELECT MAX vs ORDER BY LIMIT 1 question has been answered here several times, but if I add a WHERE clause, things change dramatically
Here is my table:
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
id | b'int' | 'NO' | 'PRI' | None | 'auto_increment' |
'open_time' | b'bigint' | 'NO' | 'UNI' | None | '' |
Note, that both columns are indexed.
And here are the requests:
SELECT id from table
WHERE open_time > 0
ORDER BY id DESC LIMIT 1
SELECT MAX(id) from BTCUSDT1mHist
WHERE open_time > 0
EXPLAIN ANALYZE shows the following: ORDER BY:
-> Limit: 1 row(s) (cost=0.10 rows=1) (actual time=0.038..0.038 rows=1 loops=1)
-> Filter: (table.open_time > 0) (cost=0.10 rows=1) (actual time=0.037..0.037 rows=1 loops=1)
-> Index scan on table using PRIMARY (reverse) (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)
MAX():
-> Aggregate: max(table.id) (cost=325890.06 rows=1081033) (actual time=1025.181..1025.181 rows=1 loops=1)
-> Filter: (table.open_time > 0) (cost=217786.76 rows=1081033) (actual time=0.032..866.890 rows=2180645 loops=1)
-> Index range scan on table using open_time (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)
ORDER BY finishes in 0.0012 seconds, while MAX() does in 1.026 seconds
I have read this question also, but it doesn't seem to cover my situation
The question is: why does MAX() takes so much longer than ORDER BY LIMIT?
CodePudding user response:
Compare in the analyze:
-> Index scan on table using PRIMARY (reverse) (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)
Versus:
-> Index range scan on table using open_time (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)
Examining 2 rows must be a lot quicker than examining 2,180,645 rows.
In the query with ORDER BY id DESC LIMIT 1
, it uses the primary key index. It starts at the end because it's a reverse order. Then it just iterates down the leaf nodes of the index (in descending order), until it examines the first row that also matches open_time > 0
. Then the LIMIT optimization allows the query execution to finish. Based on its statistics, it estimates this will happen after examining 2 rows.
In the query with MAX(id)
, it uses the index on open_time
. But because it's a range condition open_time > 0
, it can't assume the maximum id is found at the start or end of that range. So it must examine every matching entry in the open_time
index, searching for the greatest value of id
(primary keys are implicitly part of a secondary index). There's no chance of early-termination as there is in the query with LIMIT
.