Postgresql "Column must appear in the GROUP BY clause or be used in an aggregate function"-CodePudding

I get the column "measurementResults.value" must appear in the GROUP BY clause or be used in an aggregate function error for this query:

SELECT avg("measurementResults"."value") AS "value",
       min("measurement"."timestamp") AS "timestamp",
       min("testProtocolItemResults"."ranking") AS "ranking"
FROM "measurement"
JOIN "measurementResults" ON "measurementResults"."measurement" = "measurement"."id"
JOIN "conditions" ON "conditions"."measurement" = "measurement"."id"
JOIN "testProtocolItemResults" ON "testProtocolItemResults"."id" = "measurementResults"."testProtocolItemResults"
JOIN "testProtocolSessionItem" ON "testProtocolSessionItem"."id" = "measurement"."testProtocolSessionItem"
WHERE "measurement"."athlete" = 334
  AND "measurementResults"."testProtocolItemResults" = 1
  AND "conditions"."conditions" = '6'
GROUP BY "testProtocolSessionItem"."testProtocolSession",
         "testProtocolItemResults"."ranking"
ORDER BY (CASE
              WHEN "ranking" = 'greater'::text THEN "value"
              ELSE NULL::double precision
          END) DESC
LIMIT 3

If I change the ORDER BY clause like this, it works as expected:

...
ORDER BY "value" DESC
...

What am I doing wrong?

CodePudding user response：

Your problem has a couple of roots:

Most importantly, don't use the same name for an output column that is distinct from an input column (of the same name). That's a loaded foot-gun.

Secondly, make it a habit to table-qualify all columns used in a complex query involving multiple tables. Even if that seems to work, it might already be doing something else than you think. And even if it works correctly it may break later, if any column names are changed (added, removed renamed). With some bad luck it breaks silently, and your query happily ever after produces nonsense.

Thirdly, the SQL standard, which has somewhat confusing visibility rules. See:

GROUP BY CASE statement

In your working alternative query, "value" resolves to the output column "value", which hides any input column of the same name in ORDER BY. That works as expected (that is, if you actually meant to target the output column).

In your failing query, "value" resolves to the input column "measurementResults.value". You cannot throw output columns into a new computation in ORDER BY, you can only use them "as is". So, with output columns out of the way, "value" resolves to the input column (now not hidden any more). And that leads to the reported error. Obviously, you cannot order by an input column after aggregating - except if you grouped by it, directly or indirectly.

You could repair your query with:

ORDER  BY (ranking = 'greater') IS TRUE, "value" DESC

The sorts all rows where ranking = 'greater' is not true to the top - like your CASE expression would. So treating null and false alike.

Subtle difference: Those leading rows are sorted by value, while your original would list them in arbitrary order. May or may not be welcome.

I assume you are aware that null values sort on top in descending order? And that you can change that? See:

Sort by column ASC, but NULL values first?

If that's not good enough (or for more complex expressions), you must be more verbose and explicit: one way is to wrap the whole query into a subquery, and order (and limit!) in the outer SELECT:

SELECT avg_value, min_timestamp, min_ranking
FROM  (
   SELECT ir.ranking                          -- !
        , avg(mr."value")    AS avg_value     -- !
        , min(m."timestamp") AS min_timestamp -- !
        , min(ir.ranking)    AS min_ranking   -- !
   FROM   measurement               m
   JOIN   "measurementResults"      mr ON mr.measurement = m.id
   JOIN   conditions                c  ON c.measurement = m.id
   JOIN   "testProtocolItemResults" ir ON ir.id = mr."testProtocolItemResults"
   JOIN   "testProtocolSessionItem" si ON si.id = m."testProtocolSessionItem"
   WHERE  m."athlete" = 334
   AND    mr."testProtocolItemResults" = 1
   AND    c."conditions" = '6'
   GROUP  BY si."testProtocolSession", ir.ranking
   ) sub
ORDER  BY CASE WHEN ranking = 'greater' THEN "value" END DESC
LIMIT  3

Especially for queries with a small LIMIT, this may be more expensive if Postgres cannot optimize the query plan as well any more.

Aside:
Use legal, loser-case identifiers, so you don't have to double-quote.
And use table aliases to de-noise your big queries.