Home > Blockchain >  how can I count some values for data in a table based on same key in another table in Bigquery?
how can I count some values for data in a table based on same key in another table in Bigquery?

Time:11-21

I have one table like bellow. Each id is unique.

id times_of_going_out
fef666 2
S335gg 1
9a2c50 1

and another table like this one ↓. In this second table the "id" is not unique, there are different "category_name" for a single id.

id category_name city
S335gg Games & Game Supplies tk
9a2c50 Telephone Companies os
9a2c50 Recreation Centers ky
fef666 Recreation Centers ky

I want to find the difference between destinations(category_name) of people who go out often(times_of_going_out<5) and people who don't go out often(times_of_going_out<=5).

** Both tables are a small sample of large tables.

 ・ Where do people who go out twice often go?  ・ Where do people who go out 6times often go? Thank you

The expected result could be something like

less than 5 more than 5
top ten “category_name” for uid’s with "times_of_going_out" less than 5 times top ten “category_name” for uid’s with "times_of_going_out" more than 5 times

CodePudding user response:

Steps:

  • combining data and aggregating total time_going_out
  • creating the categories that you need : less than equal to 5 and more than 5. if you don't need equal to 5, you can adjust the code
  • ranking both categories with top 10, using dense_rank(). this will produce the rank from 1 - 10 based on the total time_going out
  • filtering the cases so it takes top 10 values for both categories

with main as (

select 
category_name,
sum(coalesce(times_of_going_out,0)) as total_time_per_category
from table1 as t1
left join table2 as t2
on t1.id = t2.id
group by 1
),

category as (

select
*,

if(total_time_per_category >= 5, 'more than 5', 'less than equal to 5') as is_more_than_5_times
from main
), 

ranking_ as (

select *, 

case when
 is_more_than_5_times = 'more than 5' then 
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category desc)
        else NULL
    end AS rank_more_than_5,

case when
 is_more_than_5_times = 'less than equal to 5' then 
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category)
        else NULL
    end AS rank_less_than_equal_5

from category
)

select 
is_more_than_5_times,
string_agg(category_name,',') as list
from ranking_
where rank_less_than_equal_5 <=10 or rank_more_than_5 <= 10
group by 1

  • Related