I've seen a great many different forms of left anti-semi join. Allow me to list and name every one that comes to mind. The following queries are intended to return every ROSTER_ID
that's not used by any employee and who is the owner of that roster.
--1) NOT EXISTS, with a particular column selected in the subquery
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--2) NOT EXISTS, with a particular column selected in the subquery and TOP (1) used
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT TOP (1) EMP_ID FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--3) NOT EXISTS, with all data selected in the subquery
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT * FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--4) NOT EXISTS, with all columns selected in the subquery and TOP (1) used
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT TOP (1) * FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--5) NOT EXISTS, but just use SELECT 1
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT 1 FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--6) NOT IN
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE EMP_ID NOT IN (SELECT EMP_ID FROM EMPLOYEES)
--7) LEFT JOIN
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
LEFT OUTER JOIN EMPLOYEES
ON EMPLOYEES.EMP_ID = ROSTERS.EMP_ID
WHERE EMPLOYEES.EMP_ID IS NULL
My question is this: Is there any objective reason - e.g. performance, backwards compatibility, portability, NULL-handling, ease of testing, extensibility, etc - to prefer any particular way of doing a left anti-semi join? I'm also interested to hear subjective reasons, e.g. style concerns or clarity, but only including them would be a non-answer.
My own research only points to the below, but it's all weak and probably subjective:
- Microsoft's U-SQL documentation hints that they prefer to use the NOT IN version (#6 of mine) in T-SQL.
NULL
handling withIN
is always cause for concern, which is a mild reason to preferEXISTS
over it.- If you're seriously concerned about backwards compatibility, then I think my LEFT JOIN syntax didn't work in the 1980's.
- Some people like to use
EXCEPT
, but I don't think that it generalises to cases where a column appears in only the outermostSELECT
.
CodePudding user response:
No, except for NOT IN sucking
Aaron's article already is full of juicy information, and you have already spotted the danger of using NOT IN
in combination with NULLS.
The only thing I can add that I didn't notice discussed is how, when using (not) exists
, the things before FROM
like select column
or select top 1 *
are essentially junk. I even posted a recommendation to sql server to have an alternative syntax of (anti) semi join table2 on <join condition>
.
CodePudding user response:
EXISTS
, NOT EXISTS
return booolean TRUE/FALSE
depending.The Select
columns mention in EXISTS
subquery do not matter,it only check whether it exists or not depending upon where cluase
For example,
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT 1/0 FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
In above notice 1/0
wont throw error ,this mean what ?
So your 1),2),3),4),5) are all equal in performance and result.
LEFT JOIN
:It is use when you require column from LEFT JOIN table in resultset.