I want to count the number of times each category of feathers appear in the dataset, and then filter only for those that have been counted more than 5 times when the column beak
has the category long
.
However, I'm getting the following error:
near "(": syntax error
SELECT
land_birds.feather, land_birds.weight, COUNT(DISTINCT land_birds.feather) AS numFeathers,
land_birds.size, sea_birds.beak
FROM
land_birds
INNER JOIN
sea_birds
ON
land_birds.colour = sea_birds.colour
WHERE sea_birds.colour IN (SELECT colour from land_birds) AND beak LIKE 'Long'
GROUP BY feather
ORDER BY feather ASC
FILTER(WHERE numFeathers > 5)
CodePudding user response:
To filter on information that is generated by grouping you use a HAVING
clause which is placed immediately after the GROUP BY
clause, like this:
SELECT
land_birds.feather
, land_birds.weight
, COUNT(DISTINCT land_birds.feather) AS numFeathers
, land_birds.size
, sea_birds.beak
FROM land_birds
INNER JOIN sea_birds ON land_birds.colour = sea_birds.colour
WHERE beak LIKE 'Long'
GROUP BY land_birds.feather
HAVING COUNT(DISTINCT land_birds.feather) > 5
ORDER BY land_birds.feather ASC
Whilst it may seem logical in the having clause to use the "numFeathers" alias you gave that calculation, don't. Reference the calculation itself instead. It may help to remember this that you can reference a grouping calculation in the having clause that does not appear in the select clause e.g. this would still work
SELECT
land_birds.feather
, land_birds.weight
, land_birds.size
, sea_birds.beak
FROM land_birds
INNER JOIN sea_birds ON land_birds.colour = sea_birds.colour
WHERE beak LIKE 'Long'
GROUP BY land_birds.feather
HAVING COUNT(DISTINCT land_birds.feather) > 5
ORDER BY land_birds.feather ASC
Here that there is no column alias at all for that calculation.
Other observations about your query.
- always use the table name (or table alias) when referring to columns throughout your query
- the inner join condition about colour means there can only be rows in the result that exactly match that condition. Hence it is NOT required that you also include the same condition in the where clause.
One final note, please don't treat the having clause as a substitute for a where clause. The where clause occurs before the grouping and hence it reduces the amount of data to be grouped. A having clause filters on the generated information which can only exist after grouping. In short they are very different clauses with specific capabilities and uses.