Suppose i have the following tables
Projects (manager,title,year, other irrelevant stuff) with title and year as primary key
ProjectAwards (title,year,award,result) with title, year as foreign keys to Projects and title,year,award as primary key.
result can be 'won' or 'nominated' award is stuff like 'best customer review' , 'fastest project', 'most profitable' etc...
I also have this one column list created with WITH called years_sp containing certain years.
I need to output every manager that has managed at least one project that won at least one award in EACH ONE of the years in years_sp.
I'm fairly new to sql so the best i could come up with was
SELECT Projects.manager
FROM (Projects CROSS JOIN ProjectAwards)
INNER JOIN years_sp
ON Movies.year = years_sp.sp
WHERE Projects.title = ProjectAwards.title AND Project.year=ProjectAwards.year AND ProjectAwards.result='won'
GROUP BY Movie.director
Having COUNT(*) > COUNT(years_sp)
but this gives the managers that managed a successful project on at least one of the years in year_sp not ALL the years in year_sp.
I quite frankly have no idea how to proceed, there is nothing like FOR EVERY in sql that i can think of. Any suggestion would be apreciated.
CodePudding user response:
You are correct actually Matteo that you don't strictly need LEFT JOIN ... maybe I was making no sense about that anyway. But be that as it may, here is a solution that follows your lead in evaluating the count of years as the determining factor, and the only thing we need to do is first ensure we are counting DISTINCT rows for each manager/year.
Also thanks to jian for a good starting point as well.
Apologies this is written in a SQL Server dialect, but I'm sure you can make the needed adjustments for Postgres.
-- outer query will count rows where an award was won
-- and filter to those where the count is the same as the count of years
select manager, count(manager) as CountOfYearsWithAwards
from
(
-- inner query will result on one DISTINCT row
-- for each manager/year where an award was one
-- in a relevant year from the years_sp table
select distinct p.manager, p.[year]
from #project p
inner join #projectawards pa
on p.title = pa.title
and p.[year] = pa.[year]
inner join #years_sp y -- only look at years that are in the years_sp table!
on y.sp = pa.[year]
where pa.result = 'won'
) winners
group by winners.manager
having count(manager) = (select count(*) from #years_sp)
Here is my test data (also created in a SQL Server dialect):
drop table if exists #project, #projectawards, #years_sp;
create table #project(
manager nvarchar(30),
title nvarchar(30),
[year] int,
primary key clustered (title, [year]));
create table #projectawards(
title nvarchar(30),
[year] int,
award nvarchar(30),
result nvarchar(30),
primary key clustered (title, [year], award));
create table #years_sp(sp int);
insert into #years_sp (sp) values(2017),(2018),(2019),(2020);
insert into #project (manager, title, [year]) values('m1','t1',2019);
insert into #project (manager, title, [year]) values('m2','t1',2020);
insert into #project (manager, title, [year]) values('m3','t2',2018);
insert into #project (manager, title, [year]) values('m1','t3',2017);
insert into #project (manager, title, [year]) values('m1','t3',2019);
insert into #project (manager, title, [year]) values('m4','t4',2017);
insert into #project (manager, title, [year]) values('m4','t4',2018);
insert into #project (manager, title, [year]) values('m4','t4',2019);
insert into #project (manager, title, [year]) values('m4','t4',2021);
insert into #project (manager, title, [year]) values('m5','t5',2017);
insert into #project (manager, title, [year]) values('m5','t5',2018);
insert into #project (manager, title, [year]) values('m5','t5',2019);
insert into #project (manager, title, [year]) values('m5','t5',2020);
insert into #projectawards (title, [year], award, result) values('t1',2019,'gold','nominated');
insert into #projectawards (title, [year], award, result) values('t1',2020,'gold','nominated');
insert into #projectawards (title, [year], award, result) values('t2',2018,'gold','won');
insert into #projectawards (title, [year], award, result) values('t3',2019,'gold','won');
insert into #projectawards (title, [year], award, result) values('t3',2019,'brown','nominated');
insert into #projectawards (title, [year], award, result) values('t4',2017,'brown','won');
insert into #projectawards (title, [year], award, result) values('t4',2018,'brown','won');
insert into #projectawards (title, [year], award, result) values('t4',2019,'brown','won');
insert into #projectawards (title, [year], award, result) values('t4',2021,'brown','won');
insert into #projectawards (title, [year], award, result) values('t5',2017,'brown','won');
insert into #projectawards (title, [year], award, result) values('t5',2018,'brown','won');
insert into #projectawards (title, [year], award, result) values('t5',2019,'brown','won');
insert into #projectawards (title, [year], award, result) values('t5',2020,'brown','won');
insert into #projectawards (title, [year], award, result) values('t5',2020,'gold','won');
The query below is not an answer to the question, but it is a similar result that shows you which years each manager won or did not win - so for instance dumping this in a temp table you can also make the relevant counts and quickly arrive at the same answer (and here we do use a left join strategy to see the haves and have nots all together):
select years.sp, years.manager, winners.manager as won
from
(select t1.sp, t2.manager
from #years_sp t1
cross join (select distinct manager from #project) t2) years
left join
(
select distinct p.manager, p.[year]
from #project p
inner join #projectawards pa
on p.title = pa.title
and p.[year] = pa.[year]
where pa.result = 'won'
) winners
on years.[sp] = winners.[year]
and years.manager = winners.manager