So I'm trying to union two queries and then order by a column. However, whenever I try and run the query it gives an error that doesn't make sense to me.
Example data
CREATE TABLE test (
Company_code VARCHAR(120) NOT NULL,
operating_year INT NOT NULL,
Profit INT NOT NULL
);
INSERT INTO test (Company_code, operating_year, Profit)
VALUES ('A', 1999, 2000),
('A', 2000, 3000),
('B', 1999, 1600),
('B', 2000, 4000);
Query
SELECT
t.company_code,
t.profit
FROM
test t
WHERE
t.company_code = 'A'
UNION
SELECT
t.company_code,
t.profit
FROM
test t
WHERE
t.company_code = 'B'
ORDER BY
-- t.profit; --- Does *not* work
-- profit; --- Does work
Ignore the very basic example, and how just adding an OR to the WHERE statement resolves this.
My question is why does having an alias in the ORDER BY
throw an error when a UNION
is involved. But not when run individually?
CodePudding user response:
Because you're not sorting the separate queries that reference columns through a table alias. You are ordering the UNION query output which is almost like a table itself.
The ORDER BY
is performed after the UNION
this is equivalent:
SELECT * FROM (
SELECT
t.company_code,
t.profit
FROM test t
WHERE t.company_code = 'A'
UNION
SELECT
t.company_code,
t.profit
FROM test t
WHERE t.company_code = 'B'
) t
ORDER BY t.profit; --- Should work
--ORDER BY profit; --- Does work
CodePudding user response:
My question is why does having an alias (
t
) in theORDER BY
throw an error when aUNION
is involved. But not when run individually.
When you use a set-theoretic operator between two queries (e.g. UNION
, UNION ALL
, EXCEPT
, INTERSECT
, etc) the ORDER BY
clause now applies to the end result of that operator, which is a new anonymous derived-table. So there's no way to bind t
in ORDER BY
because it is now out-of-scope.
If you add parentheses around the derived-tables it's easier to see why... so your query is really like this pseudo-SQL:
SELECT
company_code,
profit
FROM
(
SELECT
t.company_code,
t.profit
FROM
test t
WHERE
t.company_code = 'A'
) AS q1
UNION
(
SELECT
t.company_code,
t.profit
FROM
test t
WHERE
t.company_code = 'B'
) AS q2
ORDER BY
t.profit <-- `t` is not in scope. It's masked by (or "buried inside"?) q1 and q2.
profit <-- this works because `profit` is in the result of the UNION