Home > database >  NOT IN vs concatenate columns
NOT IN vs concatenate columns

Time:04-29

Isn't both below SQL the same? I mean functionality wise should do the same thing?

I was expecting this first SQL should have got result as well.

SELECT * 
FROM #TEST 
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1) 
  AND COL2 NOT IN (SELECT COL2 FROM #TEST_1) 

--1 record

SELECT * 
FROM #TEST 
WHERE COL1   COL2 NOT IN (SELECT COL1   COL2 FROM #TEST_1)



CREATE TABLE #TEST 
(
    COL1 VARCHAR(10), 
    COL2 VARCHAR(10), 
    COL3 VARCHAR(10)
) 

INSERT INTO #TEST VALUES ('123', '321', 'ABC')
INSERT INTO #TEST VALUES ('123', '436', 'ABC')

CREATE TABLE #TEST_1
(
    COL1 VARCHAR(10), 
    COL2 VARCHAR(10), 
    COL3 VARCHAR(10)
) 

INSERT INTO #TEST_1 VALUES ( '123','532','ABC')
INSERT INTO #TEST_1 VALUES ( '123','436','ABC')

--No result
SELECT * 
FROM #TEST 
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1) 
  AND COL2 NOT IN (SELECT COL2 FROM #TEST_1) 

--1 record
SELECT * 
FROM #TEST 
WHERE COL1   COL2 NOT IN (SELECT COL1   COL2 FROM #TEST_1)

CodePudding user response:

Let's put this into a bit more context and look at your 2 WHERE clauses, which I'm going to call "WHERE 1" and "WHERE 2" respectively:

--WHERE 1
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1) 
  AND COL2 NOT IN (SELECT COL2 FROM #TEST_1) 

--WHERE 2
WHERE COL1   COL2 NOT IN (SELECT COL1   COL2 FROM #TEST_1)

As you might have noticed, this do not behave the same. In fact, from a logic point of view and the way the database engine would handle them they are completely different.

WHERE 2, to start with is not SARGable. This means that any indexes on your tables would not be able to able to be used and the data engine would have to scan the entire table. For WHERE 1, however, it is SARGable, and if you had any indexes, they could be used to perform seeks, likely helping with performance.

From the point of view of logic let's look at WHERE 2 first. This requires that the concatenated value of COL1 and COL2 not match the other concatenated value of COL1 and COL2; which means these values must be on the same row. So '123456' would match only when Col1 has the value '123' and Col2 the value '456'.

For WHERE 1, however, here the value of Col1 needs to be not found in the other table, and Col2 needs to be not found as well, but they can be on different rows. This is where things differ. As '123' in Col1 appears in both tables (and is the only value) then the NOT IN isn't fulfilled and no rows are returned.

In you wanted a SARGable version of WHERE 2, I would suggest using an EXISTS:

--1 row
SELECT T.COL1, --Don't use *, specify your columns
       T.COL2, --Qualifying your columns is important!
       T.COL3
FROM #TEST T --Aliasing is important!
WHERE NOT EXISTS (SELECT 1
                  FROM #TEST_1 T1
                  WHERE T1.COL1 = T.COL1
                    AND T1.COL2 = T.COL2);

db<>fiddle

CodePudding user response:

When you add strings in this way (using instead of concatenation) it adds the two strings and gives you numeric value. At the first query you are not adding strings so what you did is:

Select all rows from #Test that values of Col1 and Col2 are not in Test1 And actually, only first argument is cutting everything out, since you got 123 values in both tables in col1.

Second query sums that strings, but not by concatenation. It actually convert varchars to numbers behind the scene. So the second query does:

Select all rows from #test where COL1 COL2 (its 444 at first row, and 559 in second row) are not in #Test 1

And if you add rows at #Test1, values are:

For the first row COL1 COL2= 655

For the second row COL1 COL2= 559

So only the row with the sum of 444 is not at #Test1, thats why you get 1 row as result.

To sum up:

Thats why you see only 1 row at the second query, and you don't see any records at your first query. At the first query only first condition actually works and cuts everything. And at the second query SQL engine is converting varchars to numerics.

So '123' '321' is not '123321' but '444'.

  • Related