I'm having some performance issues due to full scans being run on some larger tables for a report. I've narrowed things down to this section of the query but can't figure out how to avoid the scans without changing the results.
To explain, we have a data archiving system that copies data from the live table to the archive table daily. The data is not removed from the live table until a period of time has passed. This results in a state where the live table and archive table will both have the same rows, but the data in the rows may not match.
This rules out a UNION query (which would eliminate the full scans). The requirements are for the report to show live data, so I also can't query just the archive table.
Any ideas? Here is the query. The primary keys of both tables is DetailIdent, but I do have an index on OrderIdent, as it's a foreign key back to the parent table. You can see that we take the main table results if they exist, otherwise we fall back to the archive data.
SELECT COALESCE(RegOD.OrderIdent, ArcOD.OrderIdent) AS OrderIdent,
COALESCE(RegOD.Quantity, ArcOD.Quantity) AS Quantity,
COALESCE(RegOD.LoadQuan, ArcOD.LoadQuan) AS LoadQuan,
COALESCE(RegOD.ShipQuan, ArcOD.ShipQuan) AS ShipQuan,
COALESCE(RegOD.RcvdQuan, ArcOD.RcvdQuan) AS RcvdQuan,
COALESCE(RegOD.UOM, ArcOD.UOM) AS UOM,
COALESCE(RegOD.SkidType, ArcOD.SkidType) AS SkidType,
COALESCE(RegOD.Product, ArcOD.Product) AS Product,
COALESCE(RegOD.PkgCode, ArcOD.PkgCode) AS PkgCode
FROM OrderDetail RegOD
FULL JOIN dbo.ArcOrderDtl ArcOD
ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE COALESCE(RegOD.OrderIdent, ArcOD.OrderIdent) = 717010
CodePudding user response:
The filtering predicate COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) = 717010
is killing performance and it's forcing the engine to perform a full scan first, and filter data later.
Option 1 - Rephrase the COALESCE() function
Rephrase the COALESCE()
function and let the engine do its work. With a bit of luck the engine will be smart enough to find the optimization. In this case the query can take the form:
SELECT
COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) AS OrderIdent,
COALESCE(RegOD.Quantity,ArcOD.Quantity) AS Quantity,
COALESCE(RegOD.LoadQuan,ArcOD.LoadQuan) AS LoadQuan,
COALESCE(RegOD.ShipQuan,ArcOD.ShipQuan) AS ShipQuan,
COALESCE(RegOD.RcvdQuan,ArcOD.RcvdQuan) AS RcvdQuan,
COALESCE(RegOD.UOM,ArcOD.UOM) AS UOM,
COALESCE(RegOD.SkidType,ArcOD.SkidType) AS SkidType,
COALESCE(RegOD.Product,ArcOD.Product) AS Product,
COALESCE(RegOD.PkgCode,ArcOD.PkgCode) AS PkgCode
FROM OrderDetail RegOD
FULL JOIN dbo.ArcOrderDtl ArcOD ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE RegOD.OrderIdent = 717010 or ArcOD.OrderIdent = 717010
Option 2 - Combine a left join with a right anti-join instead of using a full join
If the engine doesn't optimize Option #1 above, you can still try combining a left join with a right anti-join instead of writing a full join (they are equivalent). It's definitively more verbose, but in this case it clearly shows the engine what to do. This query could look like:
SELECT -- left join here
COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) AS OrderIdent,
COALESCE(RegOD.Quantity,ArcOD.Quantity) AS Quantity,
COALESCE(RegOD.LoadQuan,ArcOD.LoadQuan) AS LoadQuan,
COALESCE(RegOD.ShipQuan,ArcOD.ShipQuan) AS ShipQuan,
COALESCE(RegOD.RcvdQuan,ArcOD.RcvdQuan) AS RcvdQuan,
COALESCE(RegOD.UOM,ArcOD.UOM) AS UOM,
COALESCE(RegOD.SkidType,ArcOD.SkidType) AS SkidType,
COALESCE(RegOD.Product,ArcOD.Product) AS Product,
COALESCE(RegOD.PkgCode,ArcOD.PkgCode) AS PkgCode
FROM OrderDetail RegOD
LEFT JOIN dbo.ArcOrderDtl ArcOD ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE RegOD.OrderIdent = 717010
UNION ALL
SELECT -- right anti-join here
OrderIdent,
Quantity,
LoadQuan,
ShipQuan,
RcvdQuan,
UOM,
SkidType,
Product,
PkgCode
FROM dbo.ArcOrderDtl ArcOD
LEFT JOIN OrderDetail RegOD ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE ArcOD.OrderIdent = 717010 and RegOD.DetailIdent IS NULL
CodePudding user response:
You want all rows for an OrderIdent, but rows (identified by DetailIdent) can be either in OrderDetail or in ArcOrderDtl or both. You want to give precedence to the OrderDetail rows if they exist.
One idea hence is to select all rows and then rank them, giving OrderDetail a better rank than ArcOrderDtl, Use TOP WITH TIES
then to get all the better ranked rows and dismiss the others.
SELECT TOP(1) WITH TIES
OrderIdent, Quantity, LoadQuan, ShipQuan, RcvdQuan, UOM, SkidType, Product, PkgCode
FROM
(
SELECT
DetailIdent, OrderIdent, Quantity, LoadQuan, ShipQuan, RcvdQuan, UOM, SkidType,
Product, PkgCode, 1 AS priority
FROM OrderDetail
WHERE OrderIdent = 717010
UNION ALL
SELECT
DetailIdent, OrderIdent, Quantity, LoadQuan, ShipQuan, RcvdQuan, UOM, SkidType,
Product, PkgCode, 2 AS priority
FROM dbo.ArcOrderDtl
WHERE OrderIdent = 717010
) unioned
ORDER BY RANK() (PARTITION BY DetailIdent ORDER BY priority);
CodePudding user response:
According to your definition there is no need for FULL JOIN
, UNION ALL
, COALESCE
. If DetailIdent
is primary key and is in Clustered index you can use this feature. As you said: OrderDetail
has priority and then ArcOrderDetail
.
DECLARE @last_not_deleted INT
DECLARE @order_ident INT = 717010
SELECT @last_not_deleted = min(DetailIdent) FROM OrderDetail
SELECT
OrderIdent,
Quantity,
LoadQuan,
ShipQuan,
RcvdQuan,
UOM,
SkidType,
Product,
PkgCode
FROM OrderDetail
WHERE OrderIdent = @order_ident
UNION
SELECT
OrderIdent,
Quantity,
LoadQuan,
ShipQuan,
RcvdQuan,
UOM,
SkidType,
Product,
PkgCode
FROM ArchOrderDetail
WHERE OrderIdent = @order_ident
AND DetailIdent < @last_not_deleted
- I have selected last not deleted
DetailIdent
fromOrderDetail
. - I have selected all valid rows from live table without
JOIN
,FULL JOIN
,LEFT JOIN
. - These data (from both tables) can be simply
UNION
, becasue you know that rows from Archive table are only in Archive table.
This logic will masively improve your performance. And there is simple transformation into SELECT valid rows of multiple events:
DECLARE @last_not_deleted INT
DECLARE @order_ident INT = 717010
SELECT @last_not_deleted = min(DetailIdent) FROM OrderDetail
SELECT
OrderIdent,
Quantity,
LoadQuan,
ShipQuan,
RcvdQuan,
UOM,
SkidType,
Product,
PkgCode
FROM OrderDetail as b
JOIN #order_idents as oi on b.OrderIdent = oi.OrderIdent
UNION
SELECT
OrderIdent,
Quantity,
LoadQuan,
ShipQuan,
RcvdQuan,
UOM,
SkidType,
Product,
PkgCode
FROM ArchOrderDetail
JOIN #order_idents as oi on b.OrderIdent = oi.OrderIdent
AND DetailIdent < @last_not_deleted
Where #order_idents is table with one integer column:
OrderIdent
717010
717011
...
717050
CodePudding user response:
Try to include all column in your index, something like:
CREATE NONCLUSTERED INDEX IX_OrderDetail_OrderIdent
ON OrderDetail (OrderIdent)
INCLUDE (—ALL COLUMN IN SELECT)
GO
CREATE NONCLUSTERED INDEX IX_OrderDtl_OrderIdent
ON OrderDtl (OrderIdent)
INCLUDE (—ALL COLUMN IN SELECT)
GO