Home > Enterprise >  Using a WITH as an aggregate value
Using a WITH as an aggregate value

Time:02-23

I am querying a Presto table where I want to calculate what percentage of the total a certain subset of the rows account for.

Consider a table like this:

id m
1 5
1 7
2 9
3 8

I want to query to report how much of the total measure (m) is contributed by each id. In this example, the total of the measure column is 29 can I find it with a query like...

SELECT SUM("m") FROM t;

output:

sqlite> SELECT SUM("m") FROM t;
29

Then I want to subtotal by id for some of the ids like

SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;

output:

sqlite> SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;
1|12
3|8

Now I want to add a third column where the subtotals are divided by the grand total (29) to get the percentage for each selected id.

I tried:

sqlite>
WITH a AS (
 SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
  WHERE "t"."id" IN ('1','3') GROUP BY "t"."id";

output:

1|12|41
3|8|27

Which is all good in SQLLite3! But when I translate this to my actual Presto DB (and the right tables and columns), I get this error:

presto error: line 10:5: 'a.g' must be an aggregate expression or appear in GROUP BY clause

I can't understand what I'm missing here or why this would be different in Presto.

CodePudding user response:

When you have a GROUP BY in your query, all expressions that the query is returning must be either:

  • the expression you are grouping by
  • or aggregate function

For example if you do GROUP BY id, the resulting query will return one row per id - you cannot just use m, because with id = 1 there are two values: 5 and 7 - so what should be returned? First value, last, sum, average? You need to tell it using aggregate function like sum(m).

Same with a.g - you need to add it to GROUP BY.

WITH a AS (
 SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
  WHERE "t"."id" IN ('1','3') GROUP BY "t"."id", "a"."g";

There's nothing special about PrestoDB here, it's more SQLite that's less strict, actually most other database engines would complain about your case.

  • Related