Home > Enterprise >  sql query for aggregation multiple columns
sql query for aggregation multiple columns

Time:03-09

I would like to write a query on presto sql.

The table:

words id1 id2 id2_like rank
baseball 28 2756 1. 6
baseball 28 3180. 0. 5
baseball 28. 8161. 0. 17
baseball 11. 1723 0. 22
baseball 11. 5329. 1. 29
football. 19. 3210. 1. 2
football. 19. 5519 0. 18
football. 19. 6257 1. 3

"id2_like" depends on "id2" and it can only be 1 or 0 . I would like to get some aggregation results from the above table within one sql query.

for each "words", we need to get

  1. total number of id2_like =1
  2. percentage of id2_like as 0 out of total id2_like
  3. number of id1 where id2_like = 0
  4. average over id1 the max rank of id2_like=0
  5. average percentage of id2 as 0 over id1 (in case some id2_like=1 and some are 0)

I know how to develop query for each one but I am not sure how to get all of them within one single sql query.

expected results:

words.   id1_cnt_for_id2_as_1  perc_id2_as_0  id1_cnt_for_id2_as_0_perc.   max_rank_id2_as_0   avg_perc_id2_as_0
baseball     2                     3/5        2                                (17 22)/2               (2/3 1/2)/2   
  
football.  2.                       2/3.     1.                             18                    1/3.   
   

CodePudding user response:

If I understand correctly here is what you want, however I didn't understand what you want for number 5

select words
    , sum(id2_like) as id2_like_total
    , sum(id2_like) * 100.0 / count(*) as perc_id2_like
    , count(distinct case when id2_like =0 then id1 end) id1_count
from tablename
group by words

CodePudding user response:

Hope it helps you to get some idea of what to do, tested in AWS Athena (pretty much like presto under the hood). Did not understood the fifth question.

SELECT 
        words, 
        item_1,  
        item_1 / CAST(size as decimal(10,4)) * 100 as item_2, 
        size - item_1 as item_3,
        max_rank as item_4
    FROM (
         SELECT 
              words, 
              SUM(id2_like) as item_1, 
              COUNT(*) as size,
              AVG(id1/CAST((SELECT MAX(rank) FROM tb WHERE id2_like = 0) as decimal(10,4))) as max_rank
         FROM tb 
         GROUP BY 1
         ) 
  • Related