What does "A derived table cannot contain references to other tables of the same SELECT" mean? I looked it up in the MySQL documentation
SELECT
salesperson.name,
max_sale.amount,
max_sale_customer.customer_name
FROM
salesperson,
-- calculate maximum size, cache it in transient derived table max_sale
LATERAL
(SELECT MAX(amount) AS amount
FROM all_sales
WHERE all_sales.salesperson_id = salesperson.id)
AS max_sale,
-- find customer, reusing cached maximum size
LATERAL
(SELECT customer_name
FROM all_sales
WHERE all_sales.salesperson_id = salesperson.id
AND all_sales.amount =
-- the cached maximum size
max_sale.amount)
AS max_sale_customer;
A derived table cannot contain references to other tables of the same SELECT (use a LATERAL derived table for that; see Section 13.2.11.9, “Lateral Derived Tables”)
CodePudding user response:
The key to understanding this is in the manual you have read:
Derived tables must be constant over the query's duration, not contain references to columns of other FROM clause tables.
That is, think of the derived table as running at the initial time of the query, before any rows from the joined tables have been read. If the derived table subquery includes any references to joined tables, then the result of the derived table would depend on data read from those tables. Can conflicts with the requirement that a derived table be constant.
LATERAL
changes that. It ensures that the derived table is evaluated later, after the query begins to read rows from the joined tables.
P.S.: The comment from jarlh above is not strictly correct. It is recommended for other reasons to use explicit JOIN
syntax (e.g. it supports outer joins and so on), but it's not required for the sake of LATERAL
.
CodePudding user response:
The documentation (here) attempts to explain that if you start with a correlated subquery
inside the select list
then it IS possible to refer to the table in the from clause
e.g.:
SELECT salesperson.name, -- find maximum sale size for this salesperson (SELECT MAX(amount) AS amount FROM all_sales WHERE all_sales.salesperson_id = salesperson.id) AS amount FROM salesperson;
In the query above you can see that the from table is salesperson but in the correlated subquery
you CAN also refer to salesperson. The correlation
is that the subquery is being executed row by row through the resultset, with each iteration of that subquery using the salesperson.id as input to the where clause. (By the way, this "row by row" effect is often why correlated subqueries
can be a cause of slowness in queries.)
However it is not permitted in traditional derived tables
, found in the from clause
, to use a correlation
like the one seen above. So the example below will fail because it tries to use a correlation
to salesperson that isn't allowed:
SELECT salesperson.name, max_sale.amount, max_sale_customer.customer_name FROM salesperson, -- calculate maximum size, cache it in transient derived table max_sale (SELECT MAX(amount) AS amount FROM all_sales WHERE all_sales.salesperson_id = salesperson.id) AS max_sale AS max_sale_customer;
As the documentation states this will result in an error:
ERROR 1054 (42S22): Unknown column 'salesperson.id' in 'where clause'
BUT, with the addition of support for LATERAL
it is possible to to replace correlated subqueries
that we once used in the select clause
with lateral subqueries
as these DO support the ability to refer to other tables referenced in the from clause
. Additionally (which can't be done in a select clause
) one lateral subquery
can refer to the result of any prior lateral subquery
which is a really very useful feature, and this is also shown by the example in the documentation:
SELECT salesperson.name, max_sale.amount, max_sale_customer.customer_name FROM salesperson, -- calculate maximum size, cache it in transient derived table max_sale LATERAL (SELECT MAX(amount) AS amount FROM all_sales WHERE all_sales.salesperson_id = salesperson.id) AS max_sale, -- find customer, reusing cached maximum size LATERAL (SELECT customer_name FROM all_sales WHERE all_sales.salesperson_id = salesperson.id AND all_sales.amount = -- the cached maximum size max_sale.amount) -- refers to prior lateral subquery result AS max_sale_customer;
So, conventional derived tables cannot use correlations, lateral subqueries can.