Home > Back-end >  SQLServer join order
SQLServer join order

Time:10-24

From the SQLServer SELECT docs:

The following steps show the logical processing order, or binding order, for a SELECT statement. This order determines when the objects defined in one step are made available to the clauses in subsequent steps. For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps.

  1. FROM
  2. ON
  3. JOIN

My question is what is the difference between the ON and JOIN in the execution plan? For example, if the query is something like:

SELECT *
FROM person JOIN county ON person.nationality=country.code

I would understand that the first step would be to check privileges for the FROM table(s):

  • Does the user have access to tables person and country ?

But then if for example a nested loop join is done to join the two tables, what is the difference between the ON and JOIN? And based on that difference, why does the ON need to come before the JOIN ?

I suppose the only thing I can think of is it will first check the ON clause to make sure the join makes sense. Two examples might be:

SELECT *
FROM person JOIN county ON 1=0 -- never need to do the join

And:

SELECT *
FROM person JOIN county ON person.badcolumn = country.code

CodePudding user response:

I think you are misunderstanding what this document is talking about:

I would understand that the first step would be to check privileges for the FROM table(s)

This is not what it is saying, it is not referring to user permissions. It is referring to how each part of a query can refer to logically previous parts of the query, but not to later parts, and explains why the SELECT cannot be referred to even though it is written earlier in the query. It is a compile-time restriction in how the query is built, not a runtime restriction.

In a query such as this:

SELECT *
FROM person p
JOIN county c ON p.nationality = c.code
  • The FROM is evaluated first, and cannot refer to any other part of the query.
    For example, you cannot do this
SELECT *
FROM OPENJSON(p.JsonColumn)
CROSS JOIN person p

It would have to be in the opposite order

SELECT *
FROM person p
CROSS APPLY OPENJSON(p.JsonColumn)
  • The ON clause comes next, before JOIN, this is unclear what the difference is, as you say. I believe this merely refers to a nested join clause, such as the below
SELECT *
FROM person p
JOIN county c
    JOIN state s ON c.state_id = s.state_id
  ON p.nationality = c.code

This may be a slight misconception by Microsoft, because in this case the nested join cannot refer to any part of the query, even the FROM

  • The JOINs and APPLYs come next, and each can refer to the previous in strictly textual order.

if for example a nested loop join is done to join the two tables

The type of join done by the compiler is irrelevant to this question.

it will first check the ON clause to make sure the join makes sense

No, that is purely a runtime semantic. You can join on any condition you can produce, even if at runtime it is known to produce no rows. The compiler may optimize it out, but it is allowed.

  • Related