I am trying to get the number of red-card fouls per team per year for soccer.
Here is my sample data, followed by the DDL DML.
soccer_team
team_id | team_name |
---|---|
1 | Team A |
2 | Team B |
3 | Team C |
match
match_id | match_date |
---|---|
1 | 01-01-2018 |
2 | 02-03-2018 |
3 | 05-04-2018 |
4 | 01-01-2018 |
5 | 02-02-2019 |
6 | 07-04-2019 |
7 | 06-13-2019 |
8 | 08-18-2019 |
9 | 03-01-2020 |
10 | 06-02-2020 |
11 | 09-03-2020 |
12 | 12-04-2020 |
foul
foul_id | team_id | match_id | foul_type |
---|---|---|---|
1 | 1 | 1 | Y |
2 | 3 | 5 | Y |
3 | 3 | 10 | Y |
4 | 3 | 10 | R |
5 | 2 | 11 | Y |
6 | 2 | 11 | R |
7 | 3 | 11 | R |
8 | 3 | 11 | R |
DDL DML:
CREATE TABLE dbo.soccer_team (
team_id int NOT NULL IDENTITY,
team_name varchar(50) NOT NULL,
CONSTRAINT PK_soccer_team PRIMARY KEY ( team_id )
);
CREATE TABLE dbo."match" (
match_id int NOT NULL IDENTITY,
match_date date NOT NULL,
CONSTRAINT PK_match PRIMARY KEY ( match_id )
);
CREATE TABLE dbo.foul (
foul_id int NOT NULL IDENTITY,
team_id int NOT NULL,
match_id int NOT NULL,
foul_type char(1) NOT NULL,
CONSTRAINT PK_foul PRIMARY KEY ( foul_id ),
CONSTRAINT FK_foul_match FOREIGN KEY(match_id) REFERENCES dbo.match (match_id),
CONSTRAINT FK_foul_soccer_team FOREIGN KEY(team_id) REFERENCES dbo.soccer_team (team_id)
);
----
SET IDENTITY_INSERT dbo.soccer_team ON;
SET IDENTITY_INSERT dbo."match" ON;
SET IDENTITY_INSERT dbo.foul ON;
INSERT INTO dbo.soccer_team ( team_id, team_name )
VALUES
(1, 'Team A'),
(2, 'Team B'),
(3, 'Team C');
INSERT INTO dbo.match ( match_id, match_date )
VALUES
(1, '01-01-2018'),
(2, '02-03-2018'),
(3, '05-04-2018'),
(4, '01-01-2018'),
(5, '02-02-2019'),
(6, '07-04-2019'),
(7, '06-13-2019'),
(8, '08-18-2019'),
(9, '03-01-2020'),
(10, '06-02-2020'),
(11, '09-03-2020'),
(12, '12-04-2020');
INSERT INTO dbo.foul (foul_id, team_id, match_id, foul_type )
VALUES
(1, 1, 1, 'Y'),
(2, 3, 5, 'Y'),
(3, 3, 10, 'Y'),
(4, 3, 10, 'R'),
(5, 2, 11, 'Y'),
(6, 2, 11, 'R'),
(7, 3, 11, 'R'),
(8, 3, 11, 'R');
SET IDENTITY_INSERT dbo.soccer_team OFF;
SET IDENTITY_INSERT dbo."match" OFF;
SET IDENTITY_INSERT dbo.foul OFF;
I can do some basic stuff like getting the number of red card fouls per team or per year:
SELECT
YEAR( match_date ) AS yearly_season,
COUNT( foul_type ) AS total_red_cards
FROM
dbo.foul AS df
RIGHT JOIN dbo.match AS dm ON
df.match_id = dm.match_id
AND
df.foul_type = 'R'
GROUP BY
YEAR( match_date )
and
SELECT
team_name,
COUNT( foul_type ) AS total_red_cards
FROM
dbo.foul AS df
RIGHT JOIN dbo.soccer_team AS dst ON
df.team_id = dst.team_id
AND
df.foul_type = 'R'
GROUP BY
team_name
Which work just fine, including teams or years with ZERO red fouls. Getting results like:
Year | Total red cards |
---|---|
2018 | 0 |
2019 | 0 |
2020 | 4 |
and
Team | Total red cards |
---|---|
Team A | 0 |
Team B | 1 |
Team C | 3 |
But I am unable to get the following which is my requirement:
Year | Team | Total red cards |
---|---|---|
2018 | Team A | 0 |
2018 | Team B | 0 |
2018 | Team C | 0 |
2019 | Team A | 0 |
2019 | Team B | 0 |
2019 | Team C | 0 |
2020 | Team A | 0 |
2020 | Team B | 1 |
2020 | Team C | 3 |
Instead, I am getting the following using queries like:
SELECT
YEAR( match_date ) AS yearly_season,
team_name,
COUNT( foul_type ) AS total_red_cards
FROM
dbo.foul AS df
INNER JOIN dbo.soccer_team AS dst ON
df.team_id = dst.team_id
AND
df.foul_type = 'R'
RIGHT JOIN dbo.match AS dm ON
df.match_id = dm.match_id
GROUP BY
YEAR( match_date ),
team_name
Year | Team | Total red cards |
---|---|---|
2018 | NULL | 0 |
2019 | NULL | 0 |
2020 | NULL | 0 |
2020 | Team B | 1 |
2020 | Team C | 3 |
or
Year | Team | Total red cards |
---|---|---|
2020 | Team B | 1 |
2020 | Team C | 3 |
Or worse even, if I use JOINS starting from Fouls all the way to Matches (the numbers don't add up):
Year | Team | Total red cards |
---|---|---|
2018 | NULL | 0 |
2019 | NULL | 0 |
2020 | NULL | 0 |
2020 | Team B | 3 |
2020 | Team C | 12 |
Maybe some sub-queries might do the job? Or there is a problem with database structure?
CodePudding user response:
Here is one way to do it. For situations like this, where this isn't a datapoint for all the results you want returned, you have to first build the required resultset - in this case using a CROSS JOIN
of the teams and the years.
Then, once you have the desired rows you can use a sub-query for each row to obtain the red card fouls.
SELECT Y.MatchYear, T.team_name
, (
SELECT COUNT(*)
FROM #foul F
INNER JOIN #match M ON M.match_id = f.match_id
WHERE F.team_id = T.team_id AND YEAR(M.match_date) = Y.MatchYear AND F.foul_type = 'R'
)
FROM #soccer_team T
CROSS JOIN (
SELECT YEAR([match_date]) MatchYear
FROM #match
GROUP BY YEAR([match_date])
) Y
ORDER BY Y.MatchYear ASC, T.team_name ASC;
Note I used temp tables as I didn't want permanent tables in my database, so just change the table names.
CodePudding user response:
An alternative to Dale's solution (which is perfectly fine). Since you want to count rows that does not exists (team that does not have a foul one year), you need to "construct" the domain (all team, all years) somehow. CROSS JOIN is the SQL lingo for the cartesian product between two sets:
select *
from dbo.match m
cross join dbo.soccer_team t
We call this relation x in lack of a better word:
from (
select *
from dbo.match m
cross join dbo.soccer_team t
) as x
Now we can join this with foul to get the year, the teamname and the related cards. Note that we need an outer join since all teams don't have a foul each year:
select year(x.match_date), x.team_name
, f.foul_type
from (
select *
from dbo.match m
cross join dbo.soccer_team t
) as x
left join dbo.foul f
on f.match_id = x.match_id
and f.team_id = x.team_id
instead of adding a filter in a WHERE clause, we can use it in an aggregate function. COUNT(x) count all x that is not null. Absent rows are already null , so we need to handle cards other than 'R'
case when f.foul_type = 'R' then 1 end
will map all foul_type other than 'R' to null, hence they will not be concidered by COUNT:
select year(x.match_date), x.team_name
, COUNT(case when f.foul_type = 'R' then 1 end)
from (
select *
from dbo.match m
cross join dbo.soccer_team t
) as x
left join dbo.foul f
on f.match_id = x.match_id
and f.team_id = x.team_id
GROUP BY year(x.match_date), x.team_name
ORDER BY year(x.match_date), x.team_name
Fiddle I added a modified (removed #) version of Dale's solution
BTW, well done with improving your post. Without the DDL I would never considered answering the question. If you want to improve your future posts even more you can add a db<>fiddle