Home > Net >  Concatenating two columns from different table and using them as WHERE clause
Concatenating two columns from different table and using them as WHERE clause

Time:10-19

select top 100 * 
from ProductionPeriodic.dbo.ScanDataRaw
where sdr_ID in (
    select concat(rsw_dept, rsw_rsm_id_fk) 
    from [dbo].[RollSheetArchiveDetails] rsad
    inner join dbo.RollSheetMain rsm on rsad.rsw_rsm_id_fk = rsm.rsm_id 
    where rsw_PoNo = 'UHB800008'
    and rsm_status = 'R'
)
and sdr_ScanDate = '30/09/2022'

sdr_ScanDate is a string.

There are two values concatenated.

However, the query does not stop executing

CodePudding user response:

My presumption is that the subquery should run fast, but my hypothesis is that the query optimizer is doing the full join before applying the filters.

I would start by ensuring that

select rsw_dept, rsw_rsm_id_fk 
    from [dbo].[RollSheetArchiveDetails] rsad
    inner join dbo.RollSheetMain rsm on rsad.rsw_rsm_id_fk = rsm.rsm_id 
    where rsw_PoNo = 'UHB800008'
    and rsm_status = 'R'

runs quickly. If it does not, you probably need to either add indices or re-work this to align with the indices you have.

After that, it also needs to be the case that

SELECT TOP 100 * 
FROM   ProductionPeriodic.dbo.ScanDataRaw
WHERE  sdr_ID IN ('id1', 'id2')
AND    sdr_ScanDate = '30/09/2022'

also runs fast. Again, if not optimize this part independently.

If both of those are the case though the optimizer is probably getting tripped up by your correlated subquery. You probably want to make this a proper join:

SELECT TOP 100 ScanDataRaw.* 
FROM        ProductionPeriodic.dbo.ScanDataRaw
INNER JOIN  (   SELECT DISTINCT 
                    sw_dept, 
                    rsw_rsm_id_fk 
                FROM        [dbo].[RollSheetArchiveDetails]
                INNER JOIN  dbo.RollSheetMain 
                ON          RollSheetArchiveDetails.rsw_rsm_id_fk = RollSheetMain.rsm_id 
                WHERE       rsw_PoNo = 'UHB800008'
                AND         rsm_status = 'R'
            ) AS IdParts
ON          ScanDataRaw.sdr_ID = IdParts.rsw_dept   IdParts.rsw_rsm_id_fk
WHERE       ScanDataRaw.sdr_ScanDate = '30/09/2022'

If the query optimizer still cannot figure out that the IdParts subquery executes quickly and returns few rows, you can try to force it to try each of the 3 physical joins using INNER LOOP JOIN, INNER HASH JOIN, and INNER MERGE JOIN, and you can try them each in the ScanDataRaw JOIN IdParts order and the IdParts JOIN ScanDataRaw order. Whichever of these 6 options is the fastest can become your query.

Ideally, though, it will be able to figure it out on its own without you having to specify a particular physical join.

CodePudding user response:

A curiosity: You should try with the top inside the subquery if you want a faster test and estimation of time.

Some optimizations:

  1. Check indexes vs where columns
  2. Add Distinct if there are multiple values to reduce the handling select distinct concat(rsw_dept, rsw_rsm_id_fk)

Here are other ways of handling this:

Test inner join:

select top 100 ProductionPeriodic.dbo.ScanDataRaw.* 
from ProductionPeriodic.dbo.ScanDataRaw 
INNER JOIN (
    select distinct concat(rsw_dept, rsw_rsm_id_fk)  as sdr_ID_FILTER
    from [dbo].[RollSheetArchiveDetails] rsad
    inner join dbo.RollSheetMain rsm on rsad.rsw_rsm_id_fk = rsm.rsm_id 
    where rsw_PoNo = 'UHB800008'
    and rsm_status = 'R'
) Filtering
    on sdr_ID =  Filtering.sdr_ID_FILTER
WHERE sdr_ScanDate = '20220930'

Test WHERE EXISTS:

select top 100 ScanDataRaw.* 
from ProductionPeriodic.dbo.ScanDataRaw AS ScanDataRaw
WHERE sdr_ScanDate = '20220930' 
AND EXISTS 
(select 1   
    from [dbo].[RollSheetArchiveDetails] rsad
    inner join dbo.RollSheetMain rsm on rsad.rsw_rsm_id_fk = rsm.rsm_id 
    where rsw_PoNo = 'UHB800008'
    and rsm_status = 'R' AND concat(rsw_dept, rsw_rsm_id_fk)=ScanDataRaw.sdr_ID
)

Comments: You should use global standardized format 'YYYYMMDD' for dates when defining them into a query.

  • Related