Query fast when run separately but slow when added inside a join-CodePudding

In SQL Server, this query runs very fast, less than a second:

SELECT T1.id
FROM first AS T1
WHERE T1.id = 21

This query also runs very fast, less than a second, even though it has 53 million records but only has about six records for id 21:

SELECT TOP 1 T2.value
FROM second AS T2 WITH(INDEX(IX_second))
WHERE T2.id = 21 
  AND T2.b = 1 
  AND T2.c = 0 
  AND T2.d = 0 
  AND T2.e = 0
ORDER BY T2.id, T2.b, T2.c, T2.d, T2.e, T2.timestamp DESC

However, this query, where I replace the 21 in the inner SELECT with T1.id, is very, very slow, more than 80 seconds:

SELECT T1.id, T3.value
FROM first AS T1
JOIN second AS T3 ON T3.id IN (SELECT TOP 1 T2.id
                               FROM second AS T2 WITH(INDEX(IX_second))
                               WHERE T2.id = T1.id 
                                 AND T2.b = 1 
                                 AND T2.c = 0 
                                 AND T2.d = 0 
                                 AND T2.e = 0
                               ORDER BY T2.id, T2.b, T2.c, T2.d, T2.e, T2.timestamp DESC)
 WHERE T1.id = 21

Why would this query take so very long and how do I make it faster?

Edit: Here is the plan, with some table and field names changed to protect the innocent :) brentozar.com/pastetheplan/?id=rJYBSfwws

CodePudding user response：

This looks like a case where CROSS APPLY can be used. This allows the TOP 1 and ORDER BY, but would avoid a double reference to the second table.

Try:

SELECT T1.id, T3.value
FROM first AS T1
CROSS APPLY (
    SELECT TOP 1 T2.*
    FROM second AS T2 --WITH(INDEX(IX_second))
    WHERE T2.id = T1.id 
    AND T2.b = 1 
    AND T2.c = 0 
    AND T2.d = 0 
    AND T2.e = 0
    ORDER BY T2.id, T2.b, T2.c, T2.d, T2.e, T2.timestamp DESC
) T3
WHERE T1.id = 21

If IX_second is an index on second(id), it is likely that the SQL server query optimizer will select that index without the need for an index hint.

Just to check: Are you sure you mean T2.id = T1.id and not something like T2.first_id = T1.id?

One more note: Since T2.id, T2.b, T2.c, T2.d, T2.e will all be fixed in the CROSS APPLY results, you can likely drop them from the ORDER BY clause.

ADDENDUM: Based on the query from your posted execution plan, the above would be equivalent to:

SELECT *
FROM Event_Item AS ei
CROSS APPLY (
    SELECT TOP 1 eisp1.*
    FROM Event_Item_Spread AS eisp1
        -- WITH(INDEX(IX_Event_Item_Spread__event_item__sportsbook__period__ingame__alt__timestamp_desc))
    WHERE eisp1.event_item_id = ei.id
      AND eisp1.sportsbook_id = 1
      AND eisp1.period = 0
      AND eisp1.in_game = 0
      AND eisp1.alt = 0
    ORDER BY eisp1.timestamp DESC
) AS eisp
WHERE ei.id = 39604314

CodePudding user response：

All depends on the way the DBMS is computing the data. But in this case 53M is going to be processed (T1 X T3) Times. And each time there is going to do order, reverse, query, and possibly some join with source table if the index was not build with values (b,c,d,e...).

I don't get the logic of your query. Why not use just one join...

SELECT T1.id, T3.value
  FROM first AS T1
  JOIN second AS T2 ON T1.id = T2.id 
 WHERE T1.id = 21 
  AND T2.b = 1 AND T2.c = 0 AND T2.d = 0 AND T2.e = 0

This looks to have se same result as the above.

CodePudding user response：

Using sub-queries inside an IN() statement is syntactically ok, but I don't recommend it, as they are a slow way of managing JOINs between tables.

As per others have suggested, use a temporary table. Even a CTE() in my view is at least a cleaner version of the code and much easier to understand. You also need to study your execution plan.

I've not tested this, but try something like:

    SELECT TOP 1 T2.id
    INTO #tblTEMP1
    FROM second AS T2 
    WHERE T2.b = 1 
      AND T2.c = 0 
      AND T2.d = 0 
      AND T2.e = 0
    ORDER BY T2.id, T2.timestamp DESC
..
..
    SELECT T1.id, T3.value
    FROM first AS T1
    INNER JOIN #tblTEMP1 AS t ON t.ID = t1.ID
    INNER JOIN second AS T3 ON T3.id = t.ID
    WHERE T1.id = 21

I bet this will be faster, but perhaps not as fast as you'd expect.

Again, you need to study your execution plan each case and determine the exact bottle-neck(s) and place INDEXes where necessary.