In SQL Server, this query runs very fast, less than a second:
SELECT T1.id
FROM first AS T1
WHERE T1.id = 21
This query also runs very fast, less than a second, even though it has 53 million records but only has about six records for id 21:
SELECT TOP 1 T2.value
FROM second AS T2 WITH(INDEX(IX_second))
WHERE T2.id = 21
AND T2.b = 1
AND T2.c = 0
AND T2.d = 0
AND T2.e = 0
ORDER BY T2.id, T2.b, T2.c, T2.d, T2.e, T2.timestamp DESC
However, this query, where I replace the 21 in the inner SELECT
with T1.id, is very, very slow, more than 80 seconds:
SELECT T1.id, T3.value
FROM first AS T1
JOIN second AS T3 ON T3.id IN (SELECT TOP 1 T2.id
FROM second AS T2 WITH(INDEX(IX_second))
WHERE T2.id = T1.id
AND T2.b = 1
AND T2.c = 0
AND T2.d = 0
AND T2.e = 0
ORDER BY T2.id, T2.b, T2.c, T2.d, T2.e, T2.timestamp DESC)
WHERE T1.id = 21
Why would this query take so very long and how do I make it faster?
Edit: Here is the plan, with some table and field names changed to protect the innocent :) brentozar.com/pastetheplan/?id=rJYBSfwws
CodePudding user response:
This looks like a case where CROSS APPLY
can be used. This allows the TOP 1
and ORDER BY
, but would avoid a double reference to the second
table.
Try:
SELECT T1.id, T3.value
FROM first AS T1
CROSS APPLY (
SELECT TOP 1 T2.*
FROM second AS T2 --WITH(INDEX(IX_second))
WHERE T2.id = T1.id
AND T2.b = 1
AND T2.c = 0
AND T2.d = 0
AND T2.e = 0
ORDER BY T2.id, T2.b, T2.c, T2.d, T2.e, T2.timestamp DESC
) T3
WHERE T1.id = 21
If IX_second
is an index on second(id)
, it is likely that the SQL server query optimizer will select that index without the need for an index hint.
Just to check: Are you sure you mean T2.id = T1.id
and not something like T2.first_id = T1.id
?
One more note: Since T2.id, T2.b, T2.c, T2.d, T2.e
will all be fixed in the CROSS APPLY
results, you can likely drop them from the ORDER BY
clause.
ADDENDUM: Based on the query from your posted execution plan, the above would be equivalent to:
SELECT *
FROM Event_Item AS ei
CROSS APPLY (
SELECT TOP 1 eisp1.*
FROM Event_Item_Spread AS eisp1
-- WITH(INDEX(IX_Event_Item_Spread__event_item__sportsbook__period__ingame__alt__timestamp_desc))
WHERE eisp1.event_item_id = ei.id
AND eisp1.sportsbook_id = 1
AND eisp1.period = 0
AND eisp1.in_game = 0
AND eisp1.alt = 0
ORDER BY eisp1.timestamp DESC
) AS eisp
WHERE ei.id = 39604314
CodePudding user response:
All depends on the way the DBMS is computing the data. But in this case 53M is going to be processed (T1 X T3) Times. And each time there is going to do order, reverse, query, and possibly some join with source table if the index was not build with values (b,c,d,e...).
I don't get the logic of your query. Why not use just one join...
SELECT T1.id, T3.value
FROM first AS T1
JOIN second AS T2 ON T1.id = T2.id
WHERE T1.id = 21
AND T2.b = 1 AND T2.c = 0 AND T2.d = 0 AND T2.e = 0
This looks to have se same result as the above.
CodePudding user response:
Using sub-queries inside an IN()
statement is syntactically ok, but I don't recommend it, as they are a slow way of managing JOINs between tables.
As per others have suggested, use a temporary table. Even a CTE() in my view is at least a cleaner version of the code and much easier to understand. You also need to study your execution plan.
I've not tested this, but try something like:
SELECT TOP 1 T2.id
INTO #tblTEMP1
FROM second AS T2
WHERE T2.b = 1
AND T2.c = 0
AND T2.d = 0
AND T2.e = 0
ORDER BY T2.id, T2.timestamp DESC
..
..
SELECT T1.id, T3.value
FROM first AS T1
INNER JOIN #tblTEMP1 AS t ON t.ID = t1.ID
INNER JOIN second AS T3 ON T3.id = t.ID
WHERE T1.id = 21
I bet this will be faster, but perhaps not as fast as you'd expect.
Again, you need to study your execution plan each case and determine the exact bottle-neck(s) and place INDEXes
where necessary.