Home > Software engineering >  Filter for distinct counts
Filter for distinct counts

Time:03-01

I want to count the number of times each category of feathers appear in the dataset, and then filter only for those that have been counted more than 5 times when the column beak has the category long.

However, I'm getting the following error:

near "(": syntax error

SELECT 
    land_birds.feather, land_birds.weight, COUNT(DISTINCT land_birds.feather) AS numFeathers,
    land_birds.size, sea_birds.beak
     
FROM
    land_birds
INNER JOIN
    sea_birds
ON 
    land_birds.colour = sea_birds.colour
WHERE sea_birds.colour IN (SELECT colour from land_birds) AND beak LIKE 'Long'
GROUP BY feather
ORDER BY feather ASC
FILTER(WHERE numFeathers > 5)

CodePudding user response:

To filter on information that is generated by grouping you use a HAVING clause which is placed immediately after the GROUP BY clause, like this:

SELECT
      land_birds.feather
    , land_birds.weight
    , COUNT(DISTINCT land_birds.feather) AS numFeathers
    , land_birds.size
    , sea_birds.beak
FROM land_birds
INNER JOIN sea_birds ON land_birds.colour = sea_birds.colour
WHERE beak LIKE 'Long'
GROUP BY land_birds.feather
HAVING COUNT(DISTINCT land_birds.feather) > 5
ORDER BY land_birds.feather ASC 

Whilst it may seem logical in the having clause to use the "numFeathers" alias you gave that calculation, don't. Reference the calculation itself instead. It may help to remember this that you can reference a grouping calculation in the having clause that does not appear in the select clause e.g. this would still work

SELECT
      land_birds.feather
    , land_birds.weight

    , land_birds.size
    , sea_birds.beak
FROM land_birds
INNER JOIN sea_birds ON land_birds.colour = sea_birds.colour
WHERE beak LIKE 'Long'
GROUP BY land_birds.feather
HAVING COUNT(DISTINCT land_birds.feather) > 5
ORDER BY land_birds.feather ASC 

Here that there is no column alias at all for that calculation.


Other observations about your query.

  1. always use the table name (or table alias) when referring to columns throughout your query
  2. the inner join condition about colour means there can only be rows in the result that exactly match that condition. Hence it is NOT required that you also include the same condition in the where clause.

One final note, please don't treat the having clause as a substitute for a where clause. The where clause occurs before the grouping and hence it reduces the amount of data to be grouped. A having clause filters on the generated information which can only exist after grouping. In short they are very different clauses with specific capabilities and uses.

  • Related