MySQL SELECT MAX vs ORDER BY LIMIT 1 on WHERE clause-CodePudding

SELECT MAX vs ORDER BY LIMIT 1 question has been answered here several times, but if I add a WHERE clause, things change dramatically

Here is my table:

Field	Type	Null	Key	Default	Extra
id	b'int'	'NO'	'PRI'	None	'auto_increment'
'open_time'	b'bigint'	'NO'	'UNI'	None	''

Note, that both columns are indexed.

And here are the requests:

SELECT id from table
WHERE open_time > 0
ORDER BY id DESC LIMIT 1

SELECT MAX(id) from BTCUSDT1mHist
WHERE open_time > 0

EXPLAIN ANALYZE shows the following: ORDER BY:

-> Limit: 1 row(s)  (cost=0.10 rows=1) (actual time=0.038..0.038 rows=1 loops=1)
    -> Filter: (table.open_time > 0)  (cost=0.10 rows=1) (actual time=0.037..0.037 rows=1 loops=1)
        -> Index scan on table using PRIMARY (reverse)  (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)

MAX():

-> Aggregate: max(table.id)  (cost=325890.06 rows=1081033) (actual time=1025.181..1025.181 rows=1 loops=1)
    -> Filter: (table.open_time > 0)  (cost=217786.76 rows=1081033) (actual time=0.032..866.890 rows=2180645 loops=1)
        -> Index range scan on table using open_time  (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)

ORDER BY finishes in 0.0012 seconds, while MAX() does in 1.026 seconds

I have read this question also, but it doesn't seem to cover my situation

The question is: why does MAX() takes so much longer than ORDER BY LIMIT?

CodePudding user response：

Compare in the analyze:

    -> Index scan on table using PRIMARY (reverse)  (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)

Versus:

    -> Index range scan on table using open_time  (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)

Examining 2 rows must be a lot quicker than examining 2,180,645 rows.

In the query with ORDER BY id DESC LIMIT 1, it uses the primary key index. It starts at the end because it's a reverse order. Then it just iterates down the leaf nodes of the index (in descending order), until it examines the first row that also matches open_time > 0. Then the LIMIT optimization allows the query execution to finish. Based on its statistics, it estimates this will happen after examining 2 rows.

In the query with MAX(id), it uses the index on open_time. But because it's a range condition open_time > 0, it can't assume the maximum id is found at the start or end of that range. So it must examine every matching entry in the open_time index, searching for the greatest value of id (primary keys are implicitly part of a secondary index). There's no chance of early-termination as there is in the query with LIMIT.