Home > OS >  Does SELECT DISTINCT differ from SELECT when using a NOT IN clause?
Does SELECT DISTINCT differ from SELECT when using a NOT IN clause?

Time:05-25

My

DELETE FROM FOO
WHERE [FOO_KEY] NOT IN
  (
    SELECT [FOO_KEY] FROM BAR
  )

query is running shockingly slow. I know that BAR is a very big table, so I'm tempted to write

DELETE FROM FOO
WHERE [FOO_KEY] NOT IN
  (
    SELECT DISTINCT [FOO_KEY] FROM BAR
  )

but I remember being told that:

  1. When NULLs aren't a problem (and they're not here) there's hardly any difference between IN and EXISTS.
  2. When using EXISTS, you don't need to use SELECT DISTINCT and there is no performance reason to do so.

This leaves me with good reason to believe that it is absolutely guaranteed that adding DISTINCT here will not make a difference. Is that correct?

CodePudding user response:

From a functional point of view, the queries with or without DISTINCT are identical (they would delete the same set of rows).

From a performance point of view, I am certain that SQL Server will always produce the same execution plan for both queries (but I cannot prove this).

For other database engines, this may be different. See:

  • Related