Home > Software engineering >  ORDER BY and UNION
ORDER BY and UNION

Time:10-11

So I'm trying to union two queries and then order by a column. However, whenever I try and run the query it gives an error that doesn't make sense to me.

Example data

CREATE TABLE test (
    Company_code    VARCHAR(120) NOT NULL,
    operating_year  INT          NOT NULL,
    Profit          INT          NOT NULL
);

INSERT INTO test (Company_code, operating_year, Profit)
VALUES  ('A', 1999, 2000),
        ('A', 2000, 3000),
        ('B', 1999, 1600),
        ('B', 2000, 4000);

Query

SELECT 
    t.company_code,
    t.profit
FROM
    test t
WHERE
    t.company_code = 'A'

UNION

SELECT
    t.company_code,
    t.profit
FROM
    test t
WHERE
    t.company_code = 'B'

ORDER BY
--    t.profit; --- Does *not* work
--    profit;   --- Does work

Ignore the very basic example, and how just adding an OR to the WHERE statement resolves this.

My question is why does having an alias in the ORDER BY throw an error when a UNION is involved. But not when run individually?

CodePudding user response:

Because you're not sorting the separate queries that reference columns through a table alias. You are ordering the UNION query output which is almost like a table itself.

The ORDER BY is performed after the UNION this is equivalent:

SELECT * FROM (
    SELECT 
    t.company_code,
    t.profit
    FROM test t
    WHERE t.company_code = 'A'
    UNION
    SELECT
    t.company_code,
    t.profit
    FROM test t
    WHERE t.company_code = 'B'
) t
ORDER BY t.profit; --- Should work
--ORDER BY profit; --- Does work

CodePudding user response:

My question is why does having an alias (t) in the ORDER BY throw an error when a UNION is involved. But not when run individually.

When you use a set-theoretic operator between two queries (e.g. UNION, UNION ALL, EXCEPT, INTERSECT, etc) the ORDER BY clause now applies to the end result of that operator, which is a new anonymous derived-table. So there's no way to bind t in ORDER BY because it is now out-of-scope.

If you add parentheses around the derived-tables it's easier to see why... so your query is really like this pseudo-SQL:

SELECT
    company_code,
    profit
FROM
    (
        SELECT 
            t.company_code,
            t.profit
        FROM
            test t
        WHERE
            t.company_code = 'A'
    ) AS q1

    UNION

    (
        SELECT 
            t.company_code,
            t.profit
        FROM
            test t
        WHERE
            t.company_code = 'B'
    ) AS q2
ORDER BY
    t.profit <-- `t` is not in scope. It's masked by (or "buried inside"?) q1 and q2.
    profit   <-- this works because `profit` is in the result of the UNION
  • Related