I am querying a Presto table where I want to calculate what percentage of the total a certain subset of the rows account for.
Consider a table like this:
id | m |
---|---|
1 | 5 |
1 | 7 |
2 | 9 |
3 | 8 |
I want to query to report how much of the total measure (m) is contributed by each id. In this example, the total of the measure column is 29 can I find it with a query like...
SELECT SUM("m") FROM t;
output:
sqlite> SELECT SUM("m") FROM t;
29
Then I want to subtotal by id for some of the ids like
SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;
output:
sqlite> SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;
1|12
3|8
Now I want to add a third column where the subtotals are divided by the grand total (29) to get the percentage for each selected id.
I tried:
sqlite>
WITH a AS (
SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
WHERE "t"."id" IN ('1','3') GROUP BY "t"."id";
output:
1|12|41
3|8|27
Which is all good in SQLLite3! But when I translate this to my actual Presto DB (and the right tables and columns), I get this error:
presto error: line 10:5: 'a.g' must be an aggregate expression or appear in GROUP BY clause
I can't understand what I'm missing here or why this would be different in Presto.
CodePudding user response:
When you have a GROUP BY
in your query, all expressions that the query is returning must be either:
- the expression you are grouping by
- or aggregate function
For example if you do GROUP BY id
, the resulting query will return one row per id
- you cannot just use m
, because with id = 1
there are two values: 5
and 7
- so what should be returned? First value, last, sum, average? You need to tell it using aggregate function like sum(m)
.
Same with a.g
- you need to add it to GROUP BY
.
WITH a AS (
SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
WHERE "t"."id" IN ('1','3') GROUP BY "t"."id", "a"."g";
There's nothing special about PrestoDB here, it's more SQLite that's less strict, actually most other database engines would complain about your case.