I have one table like bellow. Each id is unique.
id | times_of_going_out |
---|---|
fef666 | 2 |
S335gg | 1 |
9a2c50 | 1 |
and another table like this one ↓. In this second table the "id" is not unique, there are different "category_name" for a single id.
id | category_name | city |
---|---|---|
S335gg | Games & Game Supplies | tk |
9a2c50 | Telephone Companies | os |
9a2c50 | Recreation Centers | ky |
fef666 | Recreation Centers | ky |
I want to find the difference between destinations(category_name) of people who go out often(times_of_going_out<5) and people who don't go out often(times_of_going_out<=5).
** Both tables are a small sample of large tables.
・ Where do people who go out twice often go? ・ Where do people who go out 6times often go? Thank you
The expected result could be something like
less than 5 | more than 5 |
---|---|
top ten “category_name” for uid’s with "times_of_going_out" less than 5 times | top ten “category_name” for uid’s with "times_of_going_out" more than 5 times |
CodePudding user response:
Steps:
- combining data and aggregating total time_going_out
- creating the categories that you need : less than equal to 5 and more than 5. if you don't need equal to 5, you can adjust the code
- ranking both categories with top 10, using dense_rank(). this will produce the rank from 1 - 10 based on the total time_going out
- filtering the cases so it takes top 10 values for both categories
with main as (
select
category_name,
sum(coalesce(times_of_going_out,0)) as total_time_per_category
from table1 as t1
left join table2 as t2
on t1.id = t2.id
group by 1
),
category as (
select
*,
if(total_time_per_category >= 5, 'more than 5', 'less than equal to 5') as is_more_than_5_times
from main
),
ranking_ as (
select *,
case when
is_more_than_5_times = 'more than 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category desc)
else NULL
end AS rank_more_than_5,
case when
is_more_than_5_times = 'less than equal to 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category)
else NULL
end AS rank_less_than_equal_5
from category
)
select
is_more_than_5_times,
string_agg(category_name,',') as list
from ranking_
where rank_less_than_equal_5 <=10 or rank_more_than_5 <= 10
group by 1