Replacing WHERE clause with OR by using INNER JOIN-CodePudding

I have the following query: that is trying to get pt_ids that have certain diagnosis codes OR certain outcome_codes. I have accounted for these conditions by including the specific codes I'd like in the WHERE clause. However, I would like to know if I can also do this using in INNER JOIN.

SELECT DISTINCT
       pt_id
FROM #df
WHERE flag <> 1
      AND (diagnosis IN
          (
              SELECT Code
              FROM #df_codes
              WHERE code_id = '20'
          )
      OR outcome_code IN
         (
             SELECT Code
             FROM #df_codes
             WHERE code_id = '25'
         ));

The above code is taking quite a while to run and I am wondering if writing it in the following way 1) is appropriate/viable and 2) will leader to a faster run time. I am essentially trying to join #df with #df_codes on either Code = diagnosis OR code = outcome_code

My Solution

SELECT DISTINCT
       pt_id
FROM #df a
JOIN (select * from #df_codes where code_id = '20')b ON a.diagnosis = b.Code OR 
(select * from #df_codes where code_id = '25')c ON a.outcome_code = c.Code
WHERE flag <> 1

CodePudding user response：

I would do it like this:

SELECT DISTINCT pt_id
FROM #df
INNER JOIN #df_codes ON (
        #df.diagnosis = #df_codes.code
        AND #df_codes.code_id = '20'
        )
    OR (
        #df.outcome_code = #df_codes.code
        AND #df_codes.code_id = '25'
        )
WHERE flag <> 1

CodePudding user response：

Just a rough suggestion to create small temp tables and join it back to main table. I would suggest use the CTE or original table query and do following:

SELECT Code
into #x
FROM #df_codes
WHERE code_id = '20'

SELECT Code
into #y
FROM #df_codes
WHERE code_id = '25'


SELECT
pt_id
FROM #df
join #x x on x.Code = diagnosis 
WHERE flag <> 1
Union
SELECT
pt_id
FROM #df
join #y y on y.Code = diagnosis 
WHERE flag <> 1

Just replace #df by a CTE that way you can use the existing indexes.

CodePudding user response：

Use EXISTS instead:

SELECT DISTINCT
       d.pt_id
  FROM #df  d
 WHERE flag <> 1
   AND EXISTS (SELECT *
                 FROM #df_codes c
                WHERE (c.code = d.diagnosis AND c.code_id = '20')
                   OR (c.code = d.outcome_code AND c.code_id = '25');

If #df lists each patient a single time - then you don't need the DISTINCT. If you try using a join and a patient has both then you would get 2 rows - and then use DISTINCT to eliminate the 'duplicates'.