I would like to write a query on presto sql.
The table:
words | id1 | id2 | id2_like | rank |
---|---|---|---|---|
baseball | 28 | 2756 | 1. | 6 |
baseball | 28 | 3180. | 0. | 5 |
baseball | 28. | 8161. | 0. | 17 |
baseball | 11. | 1723 | 0. | 22 |
baseball | 11. | 5329. | 1. | 29 |
football. | 19. | 3210. | 1. | 2 |
football. | 19. | 5519 | 0. | 18 |
football. | 19. | 6257 | 1. | 3 |
"id2_like" depends on "id2" and it can only be 1 or 0 . I would like to get some aggregation results from the above table within one sql query.
for each "words", we need to get
- total number of
id2_like =1
- percentage of
id2_like
as 0 out of totalid2_like
- number of
id1
whereid2_like = 0
- average over
id1
the max rank ofid2_like=0
- average percentage of id2 as 0 over
id1
(in case someid2_like=1
and some are 0)
I know how to develop query for each one but I am not sure how to get all of them within one single sql query.
expected results:
words. id1_cnt_for_id2_as_1 perc_id2_as_0 id1_cnt_for_id2_as_0_perc. max_rank_id2_as_0 avg_perc_id2_as_0
baseball 2 3/5 2 (17 22)/2 (2/3 1/2)/2
football. 2. 2/3. 1. 18 1/3.
CodePudding user response:
If I understand correctly here is what you want, however I didn't understand what you want for number 5
select words
, sum(id2_like) as id2_like_total
, sum(id2_like) * 100.0 / count(*) as perc_id2_like
, count(distinct case when id2_like =0 then id1 end) id1_count
from tablename
group by words
CodePudding user response:
Hope it helps you to get some idea of what to do, tested in AWS Athena (pretty much like presto under the hood). Did not understood the fifth question.
SELECT
words,
item_1,
item_1 / CAST(size as decimal(10,4)) * 100 as item_2,
size - item_1 as item_3,
max_rank as item_4
FROM (
SELECT
words,
SUM(id2_like) as item_1,
COUNT(*) as size,
AVG(id1/CAST((SELECT MAX(rank) FROM tb WHERE id2_like = 0) as decimal(10,4))) as max_rank
FROM tb
GROUP BY 1
)