select top 100 *
from ProductionPeriodic.dbo.ScanDataRaw
where sdr_ID in (
select concat(rsw_dept, rsw_rsm_id_fk)
from [dbo].[RollSheetArchiveDetails] rsad
inner join dbo.RollSheetMain rsm on rsad.rsw_rsm_id_fk = rsm.rsm_id
where rsw_PoNo = 'UHB800008'
and rsm_status = 'R'
)
and sdr_ScanDate = '30/09/2022'
sdr_ScanDate is a string.
There are two values concatenated.
However, the query does not stop executing
CodePudding user response:
My presumption is that the subquery should run fast, but my hypothesis is that the query optimizer is doing the full join before applying the filters.
I would start by ensuring that
select rsw_dept, rsw_rsm_id_fk
from [dbo].[RollSheetArchiveDetails] rsad
inner join dbo.RollSheetMain rsm on rsad.rsw_rsm_id_fk = rsm.rsm_id
where rsw_PoNo = 'UHB800008'
and rsm_status = 'R'
runs quickly. If it does not, you probably need to either add indices or re-work this to align with the indices you have.
After that, it also needs to be the case that
SELECT TOP 100 *
FROM ProductionPeriodic.dbo.ScanDataRaw
WHERE sdr_ID IN ('id1', 'id2')
AND sdr_ScanDate = '30/09/2022'
also runs fast. Again, if not optimize this part independently.
If both of those are the case though the optimizer is probably getting tripped up by your correlated subquery. You probably want to make this a proper join:
SELECT TOP 100 ScanDataRaw.*
FROM ProductionPeriodic.dbo.ScanDataRaw
INNER JOIN ( SELECT DISTINCT
sw_dept,
rsw_rsm_id_fk
FROM [dbo].[RollSheetArchiveDetails]
INNER JOIN dbo.RollSheetMain
ON RollSheetArchiveDetails.rsw_rsm_id_fk = RollSheetMain.rsm_id
WHERE rsw_PoNo = 'UHB800008'
AND rsm_status = 'R'
) AS IdParts
ON ScanDataRaw.sdr_ID = IdParts.rsw_dept IdParts.rsw_rsm_id_fk
WHERE ScanDataRaw.sdr_ScanDate = '30/09/2022'
If the query optimizer still cannot figure out that the IdParts subquery executes quickly and returns few rows, you can try to force it to try each of the 3 physical joins using INNER LOOP JOIN, INNER HASH JOIN, and INNER MERGE JOIN, and you can try them each in the ScanDataRaw JOIN IdParts order and the IdParts JOIN ScanDataRaw order. Whichever of these 6 options is the fastest can become your query.
Ideally, though, it will be able to figure it out on its own without you having to specify a particular physical join.
CodePudding user response:
A curiosity: You should try with the top inside the subquery if you want a faster test and estimation of time.
Some optimizations:
- Check indexes vs where columns
- Add Distinct if there are multiple values to reduce the handling
select distinct concat(rsw_dept, rsw_rsm_id_fk)
Here are other ways of handling this:
Test inner join:
select top 100 ProductionPeriodic.dbo.ScanDataRaw.*
from ProductionPeriodic.dbo.ScanDataRaw
INNER JOIN (
select distinct concat(rsw_dept, rsw_rsm_id_fk) as sdr_ID_FILTER
from [dbo].[RollSheetArchiveDetails] rsad
inner join dbo.RollSheetMain rsm on rsad.rsw_rsm_id_fk = rsm.rsm_id
where rsw_PoNo = 'UHB800008'
and rsm_status = 'R'
) Filtering
on sdr_ID = Filtering.sdr_ID_FILTER
WHERE sdr_ScanDate = '20220930'
Test WHERE EXISTS:
select top 100 ScanDataRaw.*
from ProductionPeriodic.dbo.ScanDataRaw AS ScanDataRaw
WHERE sdr_ScanDate = '20220930'
AND EXISTS
(select 1
from [dbo].[RollSheetArchiveDetails] rsad
inner join dbo.RollSheetMain rsm on rsad.rsw_rsm_id_fk = rsm.rsm_id
where rsw_PoNo = 'UHB800008'
and rsm_status = 'R' AND concat(rsw_dept, rsw_rsm_id_fk)=ScanDataRaw.sdr_ID
)
Comments: You should use global standardized format 'YYYYMMDD' for dates when defining them into a query.