Which one is the better query plan?-CodePudding

I have this query:

SELECT TOP 1 
    MAX(HORA_LEIT), ST_BOMBA, Q_BOMBA, SEQUENCIAL 
FROM 
    DADOS 
WHERE 
    COD_PONTO = 2085 
    AND (ST_BOMBA = 'ON' OR ST_BOMBA = 'OFF') 
GROUP BY 
    ST_BOMBA, Q_BOMBA, SEQUENCIAL 
ORDER BY 
    MAX(HORA_LEIT) DESC

I decided to create two indexes:

CREATE INDEX ix_1 
ON dados (cod_ponto, St_bomba) 
INCLUDE (q_bomba, sequencial, hora_leit) 
WHERE St_bomba IN ('ON', 'OFF')

Actual execution plan: https://www.brentozar.com/pastetheplan/?id=HkxiKmDXs

and

CREATE INDEX ix_2 
ON dados (cod_ponto, hora_leit) 
INCLUDE (St_bomba, q_bomba, sequencial) 
WHERE St_bomba IN ('ON', 'OFF')

Actual execution plan: https://www.brentozar.com/pastetheplan/?id=By_1tmDQj

I figured out, as far as I can see, that the first execution plan is better, although the query optimizer is giving preference to the second one.

Am I misevaluating the performance?

CodePudding user response：

Aside from marc said in the comments, the second query plan only has to seek into 1 row in the index (Actual Number of Rows) and has a perfect cardinality estimate then because it also estimated 1 row.

The first execution plan estimates around 250 rows but actually seeks into 95 rows on the index, so slightly less efficient and a worse cardinality estimate.

But to be honest, for such a simple query plan and small amount of data, you might find the best way to compare is by looking at the TIME STATISTICS and IO STATISTICS. E.g. run SET STATISTICS TIME, IO ON; first, then run each query that produces each plan above and compare. The results will be in the Messages window in SSMS.

TIME STATISTICS will give you the Parse and Compile Time of the query (which shouldn't be super relevant for this test) and the total CPU and Elapsed Time.

IO STATISTICS will tell you how many data pages were read from Memory (the Logical Reads) to serve your query.

The plan that required less of either or both is generally better.

CodePudding user response：

The ix_2 is clearly better.

The first thing to note is that the query is written in a very convoluted way.

The query

SELECT TOP 1 
    MAX(HORA_LEIT), ST_BOMBA, Q_BOMBA, SEQUENCIAL 
FROM 
    DADOS 
WHERE 
    COD_PONTO = 2085 
    AND (ST_BOMBA = 'ON' OR ST_BOMBA = 'OFF') 
GROUP BY 
    ST_BOMBA, Q_BOMBA, SEQUENCIAL 
ORDER BY 
    MAX(HORA_LEIT) DESC

is equivalent to

SELECT TOP 1 
    HORA_LEIT, ST_BOMBA, Q_BOMBA, SEQUENCIAL 
FROM 
    DADOS 
WHERE 
    COD_PONTO = 2085 
    AND (ST_BOMBA = 'ON' OR ST_BOMBA = 'OFF') 
ORDER BY 
    HORA_LEIT DESC

As you are only interested in the TOP 1 the GROUP BY can be optimized out here.

Both execution plans recognize this and don't contain any aggregation operators.

For index 2 the filtered index condition guarantees that all rows match the ST_BOMBA condition. It just has to do a backward ordered index seek on COD_PONTO = 2085 and read the first row and then stop (as the second key column is HORA_LEIT).

For index 1 the index seek is actually two seeks.

seek on (COD_PONTO, ST_BOMBA) = (2085, 'ON')
seek on (COD_PONTO, ST_BOMBA) = (2085, 'OFF')

The combined rows matching either of those conditions then go into the Top N sort to get the TOP 1 matching row as ordered by HORA_LEIT DESC. It is only 96 rows in this case but is potentially unbounded and just depends on your data.

Reading a single row and stopping is better than reading an arbitrary number of rows and sorting them.