My
DELETE FROM FOO
WHERE [FOO_KEY] NOT IN
(
SELECT [FOO_KEY] FROM BAR
)
query is running shockingly slow. I know that BAR is a very big table, so I'm tempted to write
DELETE FROM FOO
WHERE [FOO_KEY] NOT IN
(
SELECT DISTINCT [FOO_KEY] FROM BAR
)
but I remember being told that:
- When
NULL
s aren't a problem (and they're not here) there's hardly any difference betweenIN
andEXISTS
. - When using
EXISTS
, you don't need to useSELECT DISTINCT
and there is no performance reason to do so.
This leaves me with good reason to believe that it is absolutely guaranteed that adding DISTINCT
here will not make a difference. Is that correct?
CodePudding user response:
From a functional point of view, the queries with or without DISTINCT
are identical (they would delete the same set of rows).
From a performance point of view, I am certain that SQL Server will always produce the same execution plan for both queries (but I cannot prove this).
For other database engines, this may be different. See: