New to CTE's and subqueries in SQL.
I have 3 tables:
categories (category_code, category)
countries (country_code, country, continent)
businesses (business, year_founded, category_code, country_code)
Goal is to look at oldest businesses in the world. I used a CTE:
WITH bus_cat_cont AS (
SELECT business, year_founded, category, country,
continent
FROM businesses AS b
INNER JOIN categories AS c1
ON b.category_code = c1.category_code
INNER JOIN countries AS c2
ON b.country_code = c2.country_code
)
SELECT continent,
category,
COUNT(business) AS n
FROM bus_cat_cont
WHERE n > 5
GROUP BY continent, category
ORDER BY n DESC;
The code works without WHERE n > 5
. But after adding that, I get the error:
column "n" does not exist
I realized there is a much easier way to get the output I want without a CTE.
But I'm wondering: Why do I get this error?
CodePudding user response:
This would work:
WITH bus_cat_cont AS (
SELECT business, year_founded, category, country, continent
FROM businesses AS b
JOIN categories AS c1 ON b.category_code = c1.category_code
JOIN countries AS c2 ON b.country_code = c2.country_code
)
SELECT continent, category, count(business) AS n
FROM bus_cat_cont
-- WHERE n > 5 -- wrong
GROUP BY continent, category
HAVING count(business) > 5 -- right
ORDER BY n DESC;
The output column name "n" is not visible (yet) in the WHERE
or HAVING
clause. Consider the sequence of events in an SQL query:
For the record, the result has no obvious connection to your declared goal to "look at oldest businesses in the world". year_founded
is unused in the query.
You get the most common continent/category combinations among businesses.
Aside, probably better:
SELECT co.continent, ca.category, n
FROM (
SELECT category_code, country_code, count(*) AS n
FROM businesses
GROUP BY 1, 2
HAVING count(*) > 5
) b
JOIN categories ca USING (category_code)
JOIN countries co USING (country_code)
ORDER BY n DESC;
There is really no need for a CTE.
Aggregate first, join later. See:
Beside being faster, this is also safer. While category_code
, country_code
should be defined UNIQUE
, the same may not be true for continent
and category
. (You may want to output codes additionally to disambiguate.)
count(*)
is implemented separately and slightly faster - and equivalent while business
is defined NOT NULL
.