sql query for aggregation multiple columns-CodePudding

I would like to write a query on presto sql.

The table:

words	id1	id2	id2_like	rank
baseball	28	2756	1.	6
baseball	28	3180.	0.	5
baseball	28.	8161.	0.	17
baseball	11.	1723	0.	22
baseball	11.	5329.	1.	29
football.	19.	3210.	1.	2
football.	19.	5519	0.	18
football.	19.	6257	1.	3

"id2_like" depends on "id2" and it can only be 1 or 0 . I would like to get some aggregation results from the above table within one sql query.

for each "words", we need to get

total number of id2_like =1
percentage of id2_like as 0 out of total id2_like
number of id1 where id2_like = 0
average over id1 the max rank of id2_like=0
average percentage of id2 as 0 over id1 (in case some id2_like=1 and some are 0)

I know how to develop query for each one but I am not sure how to get all of them within one single sql query.

expected results:

words.   id1_cnt_for_id2_as_1  perc_id2_as_0  id1_cnt_for_id2_as_0_perc.   max_rank_id2_as_0   avg_perc_id2_as_0
baseball     2                     3/5        2                                (17 22)/2               (2/3 1/2)/2   
  
football.  2.                       2/3.     1.                             18                    1/3.

CodePudding user response：

If I understand correctly here is what you want, however I didn't understand what you want for number 5

select words
    , sum(id2_like) as id2_like_total
    , sum(id2_like) * 100.0 / count(*) as perc_id2_like
    , count(distinct case when id2_like =0 then id1 end) id1_count
from tablename
group by words

CodePudding user response：

Hope it helps you to get some idea of what to do, tested in AWS Athena (pretty much like presto under the hood). Did not understood the fifth question.

SELECT 
        words, 
        item_1,  
        item_1 / CAST(size as decimal(10,4)) * 100 as item_2, 
        size - item_1 as item_3,
        max_rank as item_4
    FROM (
         SELECT 
              words, 
              SUM(id2_like) as item_1, 
              COUNT(*) as size,
              AVG(id1/CAST((SELECT MAX(rank) FROM tb WHERE id2_like = 0) as decimal(10,4))) as max_rank
         FROM tb 
         GROUP BY 1
         )