Home > Software design >  Sql with strange join order
Sql with strange join order

Time:10-28

I inherited a SQL query with a syntax I'm not familiar with.

It looks something like

SELECT *
FROM #TableA AS TableA
     INNER JOIN #TableB AS TableB
     INNER JOIN #TableC AS TableC
         ON TableC.Id1 = TableB.Id1
         ON TableB.Id2 = TableA.Id2

and it works (SQL Server 2019).

There are several other unrelated queries using the same syntax so it is definitely intentional.

I first thought it might be that the order of ON clauses doesn't matter since they can all be moved to the WHERE clause anyway.

But then I tried

SELECT *
FROM #TableA AS TableA
     INNER JOIN #TableB AS TableB
     INNER JOIN #TableC AS TableC
         ON TableC.Id1 = TableB.Id1
            AND TableC.Id3 = TableA.Id3
         ON TableB.Id2 = TableA.Id2

And got squiggles with a tool tip

'TableA.Id3' could not be bound

Executing it results in an error

Incorrect syntax near 'Id3'

So can someone explain how this syntax works?

Here is setup code

CREATE TABLE #TableA (Id1 int, Id2 int, Id3 int);
CREATE TABLE #TableB (Id1 int, Id2 int, Id3 int);
CREATE TABLE #TableC (Id1 int, Id2 int, Id3 int);

CodePudding user response:

When these are all INNER JOINs this syntax is a little pointless, if I am honest, but I'll cover this off later.

Let's take the first statement you have:

SELECT *
FROM #TableA AS TableA
     INNER JOIN #TableB AS TableB
     INNER JOIN #TableC AS TableC
         ON TableC.Id1 = TableB.Id1
         ON TableB.Id2 = TableA.Id2

What we have here is a deferred ON clause for the INNER JOIN to #TableB. The second ON clause is actually for the INNER JOIN on #TableB.

When using this syntax, objects outside of the scope of the table the JOIN is deferred on cannot be referenced. It's not identical syntax (as this uses a derived table), but it might be easier to demonstrate why your second query fails with a query like this:

SELECT *
FROM #TableA AS TableA
     CROSS JOIN (SELECT *
                 FROM #TableB AS TableB
                      INNER JOIN #TableC AS TableC ON TableC.Id1 = TableB.Id1
                 WHERE TableB.Id2 = TableA.Id2) BC;

This will generate a similar error, as the table aliased as TableA cannot be referenced in the derived table.

This syntax, in my opinion, makes more sense when you have a LEFT JOIN involved. TAke the following instead:

SELECT *
FROM #TableA TableA
     LEFT JOIN #TableB TableB
                INNER JOIN #TableC TableC ON TableC.Id1 = TableB.Id1
                ON TableB.Id2 = TableA.Id2;

So here, for the LEFT JOIN to find a related row then there must be a related row in TableC to TableB, if no row from the INNER JOIN is found, then TableB also won't be joined to. This would not be the same as the following query:

SELECT *
FROM #TableA TableA
     LEFT JOIN #TableB TableB ON TableB.Id2 = TableA.Id2
     INNER JOIN #TableC TableC ON TableC.Id1 = TableB.Id1;

Here this would effectively turn the LEFT JOIN to TableB into an implicit INNER JOIN, due to a row needing to be found in TableC, and the only way a row could be found is it TableB.Id2 had a non-NULL value (which is impossible if no row was found).

  • Related