I have 2 tables that have the exact same columns but different data. The columns are 'name', 'gender' and 'count'. The first table is called names_2014 and the second names_2015. My goal is simply to find the top 5 most popular names amongst both these tables.
I know that to get the most popular names for one table is:
SELECT name, count
FROM names_2014
ORDER BY count DESC
LIMIT 5;
However, the closest I've gotten to my goal is:
SELECT name, count
FROM names_2014
UNION DISTINCT -- I've tried UNION ALL as well
SELECT name, SUM(count)
FROM names_2015
GROUP BY name
ORDER BY count DESC
LIMIT 5
I've tried many similar variations to this but none of them are successful. It seems that I need to combine both of the tables, and then SUM(count) and GROUP BY name but I guess I'm not combining the tables properly. Any help is much appreciated as I've spent hours on this and I feel like the solution is so close but I just can't see it. I'm new to SQL and just trying to test my limits.
CodePudding user response:
You may perform the aggregation on a subquery that unions the two tables as the following:
select name, sum(count) cnt
from
(
select name, count
from names_2014
union all
select name, count
from names_2015
) T
group by name
order by cnt desc
limit 5
CodePudding user response:
From your final ask it's not clear if you want to separate these top 5 by source table or not. Following is one answer that you might be looking for:
with name_2014 as (
select 'a' as name, 'm' as gender, 1 as cnt
union all
select 'b' as name, 'f' as gender, 3 as cnt
union all
select 'c' as name, 'm' as gender, 2 as cnt
),
name_2015 as (
select 'd' as name, 'f' as gender, 10 as cnt
union all
select 'b' as name, 'f' as gender, 5 as cnt
union all
select 'e' as name, 'm' as gender, 1 as cnt
)
(select 'name_2014' as src_table_name, name, sum(cnt) as total_counts from name_2014 group by name order by 3 desc limit 1)
union all
(select 'name_2015' as src_table_name, name, sum(cnt) as total_counts from name_2015 group by name order by 3 desc limit 1)
This sample query will give you top 1 names per table. (You can change limit and get top 5 from your query.) If you do not want to know table names you can tweak the above query.
If you do not care about source tables at all and just want top 5 then:
with name_2014 as (
select 'a' as name, 'm' as gender, 1 as cnt
union all
select 'b' as name, 'f' as gender, 3 as cnt
union all
select 'c' as name, 'm' as gender, 2 as cnt
),
name_2015 as (
select 'd' as name, 'f' as gender, 10 as cnt
union all
select 'b' as name, 'f' as gender, 5 as cnt
union all
select 'e' as name, 'm' as gender, 1 as cnt
)
select name, sum(cnt) as total_count from
(select name, cnt from name_2014
union all
select name, cnt from name_2015)
group by 1 order by 2 desc limit 5