Home > Mobile >  Getting number of fouls per team per year for soccer
Getting number of fouls per team per year for soccer

Time:10-01

I am trying to get the number of red-card fouls per team per year for soccer.

Here is my sample data, followed by the DDL DML.

soccer_team

team_id team_name
1 Team A
2 Team B
3 Team C

match

match_id match_date
1 01-01-2018
2 02-03-2018
3 05-04-2018
4 01-01-2018
5 02-02-2019
6 07-04-2019
7 06-13-2019
8 08-18-2019
9 03-01-2020
10 06-02-2020
11 09-03-2020
12 12-04-2020

foul

foul_id team_id match_id foul_type
1 1 1 Y
2 3 5 Y
3 3 10 Y
4 3 10 R
5 2 11 Y
6 2 11 R
7 3 11 R
8 3 11 R

DDL DML:

CREATE TABLE dbo.soccer_team (
    team_id   int         NOT NULL IDENTITY,
    team_name varchar(50) NOT NULL,
    
    CONSTRAINT PK_soccer_team PRIMARY KEY ( team_id )
);

CREATE TABLE dbo."match" (
    match_id   int  NOT NULL IDENTITY,
    match_date date NOT NULL,
    
    CONSTRAINT PK_match PRIMARY KEY ( match_id )
);

CREATE TABLE dbo.foul (
    foul_id   int     NOT NULL IDENTITY,
    team_id   int     NOT NULL,
    match_id  int     NOT NULL,
    foul_type char(1) NOT NULL,
    
    CONSTRAINT PK_foul PRIMARY KEY ( foul_id ),

    CONSTRAINT FK_foul_match FOREIGN KEY(match_id) REFERENCES dbo.match (match_id),
    CONSTRAINT FK_foul_soccer_team FOREIGN KEY(team_id) REFERENCES dbo.soccer_team (team_id)
);
    
----

SET IDENTITY_INSERT dbo.soccer_team ON;
SET IDENTITY_INSERT dbo."match" ON;
SET IDENTITY_INSERT dbo.foul ON;

INSERT INTO dbo.soccer_team ( team_id, team_name )
VALUES
(1, 'Team A'),
(2, 'Team B'),
(3, 'Team C');

INSERT INTO dbo.match ( match_id, match_date )
VALUES
(1, '01-01-2018'),
(2, '02-03-2018'),
(3, '05-04-2018'),
(4, '01-01-2018'),
(5, '02-02-2019'),
(6, '07-04-2019'),
(7, '06-13-2019'),
(8, '08-18-2019'),
(9, '03-01-2020'),
(10, '06-02-2020'),
(11, '09-03-2020'),
(12, '12-04-2020');

INSERT INTO dbo.foul (foul_id, team_id, match_id, foul_type )
VALUES
(1, 1, 1, 'Y'),
(2, 3, 5, 'Y'),
(3, 3, 10, 'Y'),
(4, 3, 10, 'R'),
(5, 2, 11, 'Y'),
(6, 2, 11, 'R'),
(7, 3, 11, 'R'),
(8, 3, 11, 'R');

SET IDENTITY_INSERT dbo.soccer_team OFF;
SET IDENTITY_INSERT dbo."match" OFF;
SET IDENTITY_INSERT dbo.foul OFF;

I can do some basic stuff like getting the number of red card fouls per team or per year:

SELECT
    YEAR( match_date ) AS yearly_season,
    COUNT( foul_type ) AS total_red_cards 
FROM
    dbo.foul AS df
    RIGHT JOIN dbo.match AS dm ON
        df.match_id = dm.match_id
        AND
        df.foul_type = 'R'

GROUP BY
    YEAR( match_date )

and

SELECT
    team_name,
    COUNT( foul_type ) AS total_red_cards 

FROM
    dbo.foul AS df
    RIGHT JOIN dbo.soccer_team AS dst ON
        df.team_id = dst.team_id
        AND
        df.foul_type = 'R'

GROUP BY
    team_name

Which work just fine, including teams or years with ZERO red fouls. Getting results like:

Year Total red cards
2018 0
2019 0
2020 4

and

Team Total red cards
Team A 0
Team B 1
Team C 3

But I am unable to get the following which is my requirement:

Year Team Total red cards
2018 Team A 0
2018 Team B 0
2018 Team C 0
2019 Team A 0
2019 Team B 0
2019 Team C 0
2020 Team A 0
2020 Team B 1
2020 Team C 3

Instead, I am getting the following using queries like:


SELECT
    YEAR( match_date ) AS yearly_season,
    team_name, 
    COUNT( foul_type ) AS total_red_cards

FROM
    dbo.foul AS df
    INNER JOIN dbo.soccer_team AS dst ON
        df.team_id = dst.team_id
        AND
        df.foul_type = 'R'

    RIGHT JOIN dbo.match AS dm ON
        df.match_id = dm.match_id

GROUP BY
    YEAR( match_date ),
    team_name
Year Team Total red cards
2018 NULL 0
2019 NULL 0
2020 NULL 0
2020 Team B 1
2020 Team C 3

or

Year Team Total red cards
2020 Team B 1
2020 Team C 3

Or worse even, if I use JOINS starting from Fouls all the way to Matches (the numbers don't add up):

Year Team Total red cards
2018 NULL 0
2019 NULL 0
2020 NULL 0
2020 Team B 3
2020 Team C 12

Maybe some sub-queries might do the job? Or there is a problem with database structure?

CodePudding user response:

Here is one way to do it. For situations like this, where this isn't a datapoint for all the results you want returned, you have to first build the required resultset - in this case using a CROSS JOIN of the teams and the years.

Then, once you have the desired rows you can use a sub-query for each row to obtain the red card fouls.

SELECT Y.MatchYear, T.team_name
    , (
        SELECT COUNT(*)
        FROM #foul F
        INNER JOIN #match M ON M.match_id = f.match_id
        WHERE F.team_id = T.team_id AND YEAR(M.match_date) = Y.MatchYear AND F.foul_type = 'R'
    )
FROM #soccer_team T
CROSS JOIN (
    SELECT YEAR([match_date]) MatchYear
    FROM #match
    GROUP BY YEAR([match_date])
) Y
ORDER BY Y.MatchYear ASC, T.team_name ASC;

Note I used temp tables as I didn't want permanent tables in my database, so just change the table names.

CodePudding user response:

An alternative to Dale's solution (which is perfectly fine). Since you want to count rows that does not exists (team that does not have a foul one year), you need to "construct" the domain (all team, all years) somehow. CROSS JOIN is the SQL lingo for the cartesian product between two sets:

select *
from dbo.match m
cross join dbo.soccer_team t

We call this relation x in lack of a better word:

from ( 
  select *
  from dbo.match m
  cross join dbo.soccer_team t
) as x

Now we can join this with foul to get the year, the teamname and the related cards. Note that we need an outer join since all teams don't have a foul each year:

select year(x.match_date), x.team_name
     , f.foul_type
from ( 
  select *
  from dbo.match m
  cross join dbo.soccer_team t
) as x
left join dbo.foul f
    on f.match_id = x.match_id
    and f.team_id = x.team_id

instead of adding a filter in a WHERE clause, we can use it in an aggregate function. COUNT(x) count all x that is not null. Absent rows are already null , so we need to handle cards other than 'R'

case when f.foul_type = 'R' then 1 end

will map all foul_type other than 'R' to null, hence they will not be concidered by COUNT:

select year(x.match_date), x.team_name
     , COUNT(case when f.foul_type = 'R' then 1 end) 
from ( 
  select *
  from dbo.match m
  cross join dbo.soccer_team t
) as x
left join dbo.foul f
    on f.match_id = x.match_id
    and f.team_id = x.team_id
GROUP BY year(x.match_date), x.team_name
ORDER BY year(x.match_date), x.team_name

Fiddle I added a modified (removed #) version of Dale's solution

BTW, well done with improving your post. Without the DDL I would never considered answering the question. If you want to improve your future posts even more you can add a db<>fiddle

  • Related