Home > database >  LEFT JOIN with OR clause without UNION
LEFT JOIN with OR clause without UNION

Time:08-19

I know this shouldn't happen in a database, but it happened and we have to deal with it. We need to insert new rows into a table if they don't exist based on the values in another table. This is easy enough (just do LEFT JOIN and check for NULL values in 1st table). But...the join isn't very straight forward and we need to search 1st table on 2 conditions with an OR and not AND. So basically if it finds a match on either of the 2 attributes, we consider that the corresponding row in 1st table exists and we don't have to insert a new one. If there are no matches on either of the 2 attributes, then we consider it as a new row. We can use OR condition in the LEFT JOIN statement but from what I understand, it does full table scan and the query takes a very long time to complete even though it yields the right results. We cannot use UNION either because it will not give us what we're looking for. Just for simplicity purpose consider the scenario below (we need to insert data into tableA).

If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
    Drop Table #tableA End

If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
    Drop Table #tableB End

create table #tableA ( email nvarchar(50), id int )

create table #tableB ( email nvarchar(50), id int )


insert into #tableA (email, id) values ('[email protected]', 1), ('[email protected]', 2), ('[email protected]', 3), ('[email protected]', 4)

insert into #tableB (email, id) values ('[email protected]', 1), ('[email protected]', 2), ('[email protected]', 3), ('[email protected]', 4), ('[email protected]', 5)

 --THIS QUERY IS CORRECTLY RETURNING 1 RECORD  
 select B.email, B.id  
 from #tableB B  
 left join #tableA A on A.email = B.email or B.id = A.id  
 where A.id is null

 --THIS QUERY IS INCORRECTLY RETURNING 3 RECORDS SINCE THERE ARE ALREADY RECORDS WITH ID's 1 & 3 in tableA though the email addresses of these records don't match  
select B.email, B.id  
from #tableB B  
left join #tableA A on A.email = B.email  
where A.id is null  
union 
select B.email, B.id  
from #tableB B  
left join #tableA A on B.id = A.id  
where A.id is null


If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
    Drop Table #tableA End

If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
    Drop Table #tableB End

The 1st query works correctly and only returns 1 record, but the table size is just few records and it completes under 1 sec. When the 2 tables have thousands or records, the query may take 10 min to complete. The 2nd query of course returns the records we don't want to insert because we consider them existing. Is there a way to optimize this query so it takes an acceptable time to complete?

CodePudding user response:

You can still use UNION like so:

select email, id  
from #tableB B  
where id is not in (select id from #tableA) 
union 
select email, id  
from #tableB B  
where email is not in (select email from #tableA) 

CodePudding user response:

No the query returns correctly 3 rows

because

 select B.email, B.id  
from #tableB B  
left join #tableA A on A.email = B.email  
where A.id is null

Allone reurns the 3 rows.

For your "problemm"

 select B.email, B.id  
 from #tableB B  
 left join #tableA A on A.email = B.email or B.id = A.id  
 where A.id is null

will che3kc for every row, if it is true to be included

So for example

('[email protected]', 1)    ('[email protected]', 1)   

as the Ids are teh same it will be joined

but when you join by the emails the con dition is false an so is included in the result set

You can only use the UNION approach, when you are comparing only the emails or the ids, but with both the queries are not equivalent

  • Related