Home > Net >  When can aggregate functions be nested in standard SQL?
When can aggregate functions be nested in standard SQL?

Time:10-07

I know it wasn't allowed in SQL-92. But since then it may have changed, particularly when there's a window applied. Can you explain the changes and give the version (or versions if there were more) in which they were introduced?

Examples

  1. Is SUM(COUNT(votes.option_id)) OVER() valid syntax per standard SQL:2016 (or earlier)?

This is my comment (unanswered, an probably unlikely to in such an old question) in Why can you nest aggregate functions when using a window function in PostgreSQL?.

  1. The Calculating Running Total (SQL) kata at Codewars has as its most upvoted solution (using PostgreSQL 13.0, a highly standard compliant engine, so the code is likely to be standard) this one:
SELECT
  CREATED_AT::DATE AS DATE,
  COUNT(CREATED_AT) AS COUNT,
  SUM(COUNT(CREATED_AT)) OVER (ORDER BY CREATED_AT::DATE ROWS UNBOUNDED PRECEDING)::INT AS TOTAL
FROM
  POSTS
GROUP BY
  CREATED_AT::DATE

(Which could be simplified to:

SELECT
  created_at::DATE date,
  COUNT(*) COUNT,
  SUM(COUNT(*)) OVER (ORDER BY created_at::DATE)::INT total
FROM posts
GROUP BY created_at::DATE

I assume the ::s are a new syntax for casting I didn't know of. And that casting from TIMESTAMP to DATE is now allowed (in SQL-92 it wasn't).)

  1. As this SO answer explains, Oracle Database allows it even without a window, pulling in the GROUP BY from context. I don't know if the standard allows it.

CodePudding user response:

You already noticed the difference yourself: It's all about the window. COUNT(*) without an OVER clause for instance is an aggregation function. COUNT(*) with an OVER clause is a window function.

By using aggregation functions you condense the original rows you get after the FROM clause and WHERE clause are applied to either the specified group in GROUP BY or to one row in the absence of a GROUP BY clause.

Window functions, aka analytic functions, are applied afterwards. They don't change the number of result rows, but merely add information by looking at all or some rows (the window) of the selected data.

In

SELECT
    options.id,
    options.option_text,
    COUNT(votes.option_id) as vote_count,
    COUNT(votes.option_id) / SUM(COUNT(votes.option_id)) OVER() * 100.0 as vote_percentage
FROM options
LEFT JOIN votes on options.id = votes.option_id
GROUP BY options.id;

we first join votes to options and then count the votes per option by aggregating the joined rows down to one result row per option (GROUP BY options.id). We count on a non-nullable column in the votes table (COUNT(votes.option_id), so we get a zero count in case there are no votes, because in an outer joined row this column is set to null.

After aggregating all rows and getting thus one row per option we apply a window function (SUM() OVER) on this result set. We apply the analytic SUM on the vote count (SUM(COUNT(votes.option_id)) by looking at the whole result set (empty OVER clause), thus getting the same total vote count in every row. We use this value for a calculation: option's vote count diveded by total vote count times 100, which is the option's percentage of total votes.

The PostgreSQL query is very similar. We select the number of posts per date (COUNT(created_at) is nothing else than a mere COUNT(*)) along with a running total of these counts (by using a window that looks at all rows up to the current row).

So, while this looks like we are nesting two aggregate functions, this is not really the case, because SUM OVER is not considered an agregation function but an analytic/window function.

Oracle does allow applying an aggregate function directly on another, thus invoking a final aggregation on a previous grouped by aggregation. This allows us to get one result row of, say, the average of sums without having to write a subquery for this. This is not compliant with the SQL standard, however, and very unpopular even among Oracle developers at that.

  • Related