How to compare COUNT of two values in a single column in SQL?-CodePudding

I have a table with data something like the below (there are other columns as well like times but for the sake of this question I simplified it):

| Runner | Action |
|--------|--------|
| John   | Start  |
| Amy    | Start  |
| Susan  | Start  |
| John   | Finish |
| Amy    | Start  |
| Amy    | Finish |

I am trying to get a list of runners with a count of the races they started and a count of the races they finished and show only the racers who didn't finish all the races they started.

Desired output:

| Runner | StartCount| FinishCount|
|--------|-----------|------------|
| Amy    |      2    |     1      |
| Susan  |      1    |     0      |

To do this I'll need to compare the count of Action column against another count of the same column, but I'm not sure if this can be done because you have to add a 'where' clause for the COUNT to work, and there would be two separate WHERE clauses.

SELECT Runner, COUNT(Action), * FROM RunnerActions
WHERE Action = 'Start'

Is this possible in SQL? What is the process of comparing two separate COUNTs on the same column in a single SQL query?

CodePudding user response：

Some DBMS, for example MYSQL which you taged as your DBMS, provide the possibility to use conditional sum:

SELECT 
Runner, 
SUM(Action = 'Start') AS StartCount,
SUM(Action = 'Finish') AS FinishCount
FROM RunnerActions
GROUP BY
Runner
HAVING 
SUM(Action = 'Start') > 
SUM(Action = 'Finish');

This is easy to read and we should prefer this in my opinion.

If we use a DBMS that doesn't provide this option, we can rebuild this logic using CASE WHEN to only count the starts as one value and the finishes as another value:

SELECT 
Runner, 
COUNT(CASE WHEN Action = 'Start' THEN 1 END) AS StartCount,
COUNT(CASE WHEN Action = 'Finish' THEN 1 END) AS FinishCount
FROM RunnerActions
GROUP BY
Runner
HAVING 
COUNT(CASE WHEN Action = 'Start' THEN 1 END) > 
COUNT(CASE WHEN Action = 'Finish' THEN 1 END);

In both queries, we will GROUP BY Runner to build the sum per runner.

Finally, in the HAVING clause, we will put the condition more starts should appear then finishes.

Both queries will produce the same outcome.

They both will COUNT or SUM (this makes no difference in our use case) only those rows having the value "Start" as StartCount and only those having the value "Finish" as FinishCount.

If we remove the HAVING clause, we will also get those runners who have the same amount of starts and finishes, i.e. for those runners, StartCount and FinishCount will be identic.

If we use the HAVING clause, only those runners will be selected that have more starts than finishes.

Try out here: db<>fiddle