Home > Back-end >  Is there any objective reason to prefer a particular form of left anti-semi join?
Is there any objective reason to prefer a particular form of left anti-semi join?

Time:05-31

I've seen a great many different forms of left anti-semi join. Allow me to list and name every one that comes to mind. The following queries are intended to return every ROSTER_ID that's not used by any employee and who is the owner of that roster.

--1) NOT EXISTS, with a particular column selected in the subquery
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--2) NOT EXISTS, with a particular column selected in the subquery and TOP (1) used
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT TOP (1) EMP_ID FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--3) NOT EXISTS, with all data selected in the subquery
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT * FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--4) NOT EXISTS, with all columns selected in the subquery and TOP (1) used
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT TOP (1) * FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--5) NOT EXISTS, but just use SELECT 1
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT 1 FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--6) NOT IN
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE EMP_ID NOT IN (SELECT EMP_ID FROM EMPLOYEES)
--7) LEFT JOIN
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
LEFT OUTER JOIN EMPLOYEES
ON EMPLOYEES.EMP_ID = ROSTERS.EMP_ID
WHERE EMPLOYEES.EMP_ID IS NULL

My question is this: Is there any objective reason - e.g. performance, backwards compatibility, portability, NULL-handling, ease of testing, extensibility, etc - to prefer any particular way of doing a left anti-semi join? I'm also interested to hear subjective reasons, e.g. style concerns or clarity, but only including them would be a non-answer.

My own research only points to the below, but it's all weak and probably subjective:

  • Microsoft's U-SQL documentation hints that they prefer to use the NOT IN version (#6 of mine) in T-SQL.
  • NULL handling with IN is always cause for concern, which is a mild reason to prefer EXISTS over it.
  • If you're seriously concerned about backwards compatibility, then I think my LEFT JOIN syntax didn't work in the 1980's.
  • Some people like to use EXCEPT, but I don't think that it generalises to cases where a column appears in only the outermost SELECT.

CodePudding user response:

No, except for NOT IN sucking

Aaron's article already is full of juicy information, and you have already spotted the danger of using NOT IN in combination with NULLS.

The only thing I can add that I didn't notice discussed is how, when using (not) exists, the things before FROM like select column or select top 1 * are essentially junk. I even posted a recommendation to sql server to have an alternative syntax of (anti) semi join table2 on <join condition>.

CodePudding user response:

EXISTS, NOT EXISTS return booolean TRUE/FALSE depending.The Select columns mention in EXISTS subquery do not matter,it only check whether it exists or not depending upon where cluase

For example,

SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT 1/0 FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)

In above notice 1/0 wont throw error ,this mean what ?

So your 1),2),3),4),5) are all equal in performance and result.

LEFT JOIN :It is use when you require column from LEFT JOIN table in resultset.

  • Related