Home > Net >  LATERAL syntax in MySQL - Is it just to say that "the left table" is executed first so tha
LATERAL syntax in MySQL - Is it just to say that "the left table" is executed first so tha

Time:07-01

What does "A derived table cannot contain references to other tables of the same SELECT" mean? I looked it up in the MySQL documentation

SELECT
  salesperson.name,
  max_sale.amount,
  max_sale_customer.customer_name
FROM
  salesperson,
  -- calculate maximum size, cache it in transient derived table max_sale
  LATERAL
  (SELECT MAX(amount) AS amount
    FROM all_sales
    WHERE all_sales.salesperson_id = salesperson.id)
  AS max_sale,
  -- find customer, reusing cached maximum size
  LATERAL
  (SELECT customer_name
    FROM all_sales
    WHERE all_sales.salesperson_id = salesperson.id
    AND all_sales.amount =
        -- the cached maximum size
        max_sale.amount)
  AS max_sale_customer;

A derived table cannot contain references to other tables of the same SELECT (use a LATERAL derived table for that; see Section 13.2.11.9, “Lateral Derived Tables”)

CodePudding user response:

The key to understanding this is in the manual you have read:

Derived tables must be constant over the query's duration, not contain references to columns of other FROM clause tables.

That is, think of the derived table as running at the initial time of the query, before any rows from the joined tables have been read. If the derived table subquery includes any references to joined tables, then the result of the derived table would depend on data read from those tables. Can conflicts with the requirement that a derived table be constant.

LATERAL changes that. It ensures that the derived table is evaluated later, after the query begins to read rows from the joined tables.

P.S.: The comment from jarlh above is not strictly correct. It is recommended for other reasons to use explicit JOIN syntax (e.g. it supports outer joins and so on), but it's not required for the sake of LATERAL.

CodePudding user response:

The documentation (here) attempts to explain that if you start with a correlated subquery inside the select list then it IS possible to refer to the table in the from clause e.g.:

SELECT
  salesperson.name,
  -- find maximum sale size for this salesperson
  (SELECT MAX(amount) AS amount
    FROM all_sales
    WHERE all_sales.salesperson_id = salesperson.id)
  AS amount
FROM
  salesperson;

In the query above you can see that the from table is salesperson but in the correlated subquery you CAN also refer to salesperson. The correlation is that the subquery is being executed row by row through the resultset, with each iteration of that subquery using the salesperson.id as input to the where clause. (By the way, this "row by row" effect is often why correlated subqueries can be a cause of slowness in queries.)

However it is not permitted in traditional derived tables, found in the from clause, to use a correlation like the one seen above. So the example below will fail because it tries to use a correlation to salesperson that isn't allowed:

SELECT
  salesperson.name,
  max_sale.amount,
  max_sale_customer.customer_name
FROM
  salesperson,
  -- calculate maximum size, cache it in transient derived table max_sale
  (SELECT MAX(amount) AS amount
    FROM all_sales
    WHERE all_sales.salesperson_id = salesperson.id)
  AS max_sale
  AS max_sale_customer;

As the documentation states this will result in an error:

ERROR 1054 (42S22): Unknown column 'salesperson.id' in 'where clause'

BUT, with the addition of support for LATERAL it is possible to to replace correlated subqueries that we once used in the select clause with lateral subqueries as these DO support the ability to refer to other tables referenced in the from clause. Additionally (which can't be done in a select clause) one lateral subquery can refer to the result of any prior lateral subquery which is a really very useful feature, and this is also shown by the example in the documentation:

SELECT
  salesperson.name,
  max_sale.amount,
  max_sale_customer.customer_name
FROM
  salesperson,
  -- calculate maximum size, cache it in transient derived table max_sale
  LATERAL
  (SELECT MAX(amount) AS amount
    FROM all_sales
    WHERE all_sales.salesperson_id = salesperson.id)
  AS max_sale,
  -- find customer, reusing cached maximum size
  LATERAL
  (SELECT customer_name
    FROM all_sales
    WHERE all_sales.salesperson_id = salesperson.id
    AND all_sales.amount =
        -- the cached maximum size
        max_sale.amount) -- refers to prior lateral subquery result
  AS max_sale_customer;

So, conventional derived tables cannot use correlations, lateral subqueries can.

  • Related