I have a table with data something like the below (there are other columns as well like times but for the sake of this question I simplified it):
| Runner | Action |
|--------|--------|
| John | Start |
| Amy | Start |
| Susan | Start |
| John | Finish |
| Amy | Start |
| Amy | Finish |
I am trying to get a list of runners with a count of the races they started and a count of the races they finished and show only the racers who didn't finish all the races they started.
Desired output:
| Runner | StartCount| FinishCount|
|--------|-----------|------------|
| Amy | 2 | 1 |
| Susan | 1 | 0 |
To do this I'll need to compare the count of Action
column against another count of the same column, but I'm not sure if this can be done because you have to add a 'where' clause for the COUNT to work, and there would be two separate WHERE clauses.
SELECT Runner, COUNT(Action), * FROM RunnerActions
WHERE Action = 'Start'
Is this possible in SQL? What is the process of comparing two separate COUNTs on the same column in a single SQL query?
CodePudding user response:
Some DBMS, for example MYSQL which you taged as your DBMS, provide the possibility to use conditional sum:
SELECT
Runner,
SUM(Action = 'Start') AS StartCount,
SUM(Action = 'Finish') AS FinishCount
FROM RunnerActions
GROUP BY
Runner
HAVING
SUM(Action = 'Start') >
SUM(Action = 'Finish');
This is easy to read and we should prefer this in my opinion.
If we use a DBMS that doesn't provide this option, we can rebuild this logic using CASE WHEN
to only count the starts as one value and the finishes as another value:
SELECT
Runner,
COUNT(CASE WHEN Action = 'Start' THEN 1 END) AS StartCount,
COUNT(CASE WHEN Action = 'Finish' THEN 1 END) AS FinishCount
FROM RunnerActions
GROUP BY
Runner
HAVING
COUNT(CASE WHEN Action = 'Start' THEN 1 END) >
COUNT(CASE WHEN Action = 'Finish' THEN 1 END);
In both queries, we will GROUP BY
Runner to build the sum per runner.
Finally, in the HAVING
clause, we will put the condition more starts should appear then finishes.
Both queries will produce the same outcome.
They both will COUNT
or SUM
(this makes no difference in our use case) only those rows having the value "Start" as StartCount and only those having the value "Finish" as FinishCount.
If we remove the HAVING
clause, we will also get those runners who have the same amount of starts and finishes, i.e. for those runners, StartCount and FinishCount will be identic.
If we use the HAVING
clause, only those runners will be selected that have more starts than finishes.
Try out here: db<>fiddle