How can this change be making my query slow (OR vs UNION) and can I fix it?-CodePudding

I've just been debugging a slow SQL query.

It's a join between 2 tables, with a WHERE clause conditioning on either a property of 1 table OR the other.

If I re-write it as a UNION then it's suddenly 2 orders of magnitude faster, even though those 2 queries produce identical outputs:

DECLARE @UserId UNIQUEIDENTIFIER = '0019813D-4379-400D-9423-56E1B98002CB'

SELECT *
    FROM Bookings
    LEFT JOIN BookingPricings ON Booking = Bookings.ID
    WHERE (BookingPricings.[Owner] in (@UserId) OR Bookings.MixedDealBroker in (@UserId))

--Execution time: ~4000ms



SELECT *
    FROM Bookings
    LEFT JOIN BookingPricings ON Booking = Bookings.ID
    WHERE (BookingPricings.[Owner] in (@UserId))
UNION
SELECT *
    FROM Bookings
    LEFT JOIN BookingPricings ON Booking = Bookings.ID
    WHERE (Bookings.MixedDealBroker in (@UserId))

--Execution time: ~70ms

This seems rather surprising to me! I would have expected the SQL compiler to be entirely capable of identifying that the 2nd form was equivalent and would have used that compilation approach if it were available.

Some context notes:

I've checked and IN (@UserId) vs = @UserId makes no difference.
Nor does JOIN vs LEFT JOIN.
Those tables each have 100,000s records, and the filter cuts it down to ~100.
In the slow version it seems to be reading every row of both tables.

So:

Does anyone have any ideas for how this comes about.
What (if anything) can I do to fix the performance without just re-writing the query as a series of UNIONs (not viable for a variety of reasons.)

=-=-=-=-=-=-=

Execution Plans:

CodePudding user response：

This is a common limitation of SQL engines, not just in SQL Server, but also other database systems as well. The OR complicates the predicate enough that the execution plan selected isn't always ideal. This probably relates to the fact that only one index can be seeked into per instance of a table object at a time (for the most part), as Stu mentioned in the comments, and other factors with how SQL engines are designed.

By using a UNION clause, you now have two instances of the Bookings table referenced, which can individually be seeked on separately in the most efficient way possible. That allows the SQL Engine to pick a better execution plan to serve you query.

This is pretty much just one of those things that are the way they are because that's just the way it is, and you need to remember the UNION clause workaround for future encounters of this kind of performance issue.

Also, in response to your comment:

I don't understand how the difference can affect the EP, given that the 2 different "phrasings" of the query are identical?

A new execution plan is generated every time one doesn't exist in the plan cache for a given query, essentially. The way the Engine determines if a plan for a query is already cached is based on the exact hashing of that query statement, so even an extra space character at the end of the query can result in a new plan being generated. Theoretically that plan can be different. So a different written query (despite being logically the same) can surely result in a different execution plan.

There are other reasons a plan can change on re-generation too, such as different data and statistics of that data, in the tables referenced in the query between executions. But these reasons don't really apply to your question above.

CodePudding user response：

As already stated, the OR condition prevents the database engine from efficiently using the indexes in a single query. Because the OR condition spans tables, I doubt that the Tuning Advisor will come up with anything useful.

If you have a case where the query you have posted is part of a larger query, or the results are complex and you do not want to repeat code, you can wrap your initial query in a Common Table Expression (CTE) or a subquery and then feed the combined results into the remainder of your query. Sometimes just selecting one or more PKs in your initial query will be sufficient.

Something like:

SELECT <complex select list>
FROM (
    SELECT Bookings.ID AS BookingsID, BookingPricings.ID AS BookingPricingsID
        FROM Bookings
        LEFT JOIN BookingPricings ON Booking = Bookings.ID
        WHERE (BookingPricings.[Owner] in (@UserId))
    UNION
    SELECT Bookings.ID AS BookingsID, BookingPricings.ID AS BookingPricingsID
        FROM Bookings B
        LEFT JOIN BookingPricings ON Booking = Bookings.ID
        WHERE (Bookings.MixedDealBroker in (@UserId))
) PRE
JOIN Bookings B ON B.ID = PRE.BookingsID
JOIN BookingPricings BP ON BP.ID = PRE.BookingPricingsID
<more joins>
WHERE <more conditions>

Having just the IDs in your initial select make the UNION more efficient. The UNION can also be changed to a yet more-efficient UNION ALL with careful use of additional conditions, such as AND Bookings.MixedDealBroker <> @UserId in the second part, to avoid overlapping results.