How do I count occurrences with conditions in PostgreSQL?-CodePudding

I'm working in PostgreSQL. Suppose I have this Person table:

| id |          time         | name  |    type     |
----------------------------------------------------
|  1 | 2022-04-25 07:49:58.0 | Brian | Rejection 1 |
|  2 | 2022-04-25 07:49:58.0 | Brian | Rejection 2 |
|  3 | 2022-04-27 13:05:51.0 | Fredd | Rejection 1 |
|  4 | 2022-05-01 02:13:44.0 | Janet | Rejection 1 |
|  5 | 2022-05-01 03:45:06.0 | Janet | Rejection 2 |
|  6 | 2022-05-01 08:01:34.0 | Peter |   Approval  |
|  7 | 2022-05-01 12:12:53.0 | Frank | Rejection 2 |
|  8 | 2022-05-02 01:26:38.0 | Frank |   Approval  |

Note: We have 2 rejections types Rejection 1 and Rejection 2.

I would like to make a query that counts the number of Rejections and the number of approvals for each name. However if there are 2 rejections at the same time, for the same name, like the two first rows in the example, it should only count as one.

Let me just add that it's possible for there to be one of each rejection types at the same time for the same name, but it's impossible for there to be two rejections of the same type at the same time for the same name.

So this is what I'm expecting it to return:

| name  | approvals | rejections |
----------------------------------
| Brian |     0     |     1      |
| Fredd |     0     |     1      |
| Janet |     0     |     2      |
| Peter |     1     |     0      |
| Frank |     1     |     1      |

The closest I could get to this is the following:

SELECT 
    name,
    COALESCE(SUM(CASE WHEN log_type = 'Approval' THEN 1 ELSE 0 END), 0) approvals,
    COALESCE(SUM(CASE WHEN log_type = 'Rejection 1' OR log_type = 'Rejection 2' THEN 1 ELSE 0 END), 0) rejections
FROM 
    person
GROUP BY 
    name

The problem with this is that it counts two rejections with same time and name as 2 instead of 1.

CodePudding user response：

Use ROW_NUMBER to remove duplicates, then use a simple count query to find the counts:

SELECT
    name,
    COUNT(*) FILTER (WHERE log_type = 'Approval') approvals,
    COUNT(*) FILTER (WHERE log_type LIKE 'Rejection%') rejections
FROM
(
    SELECT *, ROW_NUMBER() OVER (PARTITION BY time, name, SUBSTRING(log_type FROM '\w ')) rn
    FROM person
) t
WHERE rn = 1
GROUP BY name;

CodePudding user response：

You can use DISTINCT inside COUNT() to count the distinct times if log_type is the 'Rejection X':

SELECT name,
       COUNT(CASE WHEN log_type = 'Approval' THEN 1 END) approvals,
       COUNT(DISTINCT CASE WHEN log_type IN ('Rejection 1', 'Rejection 2') THEN time END) rejections
FROM person
GROUP BY name;

See the demo.

CodePudding user response：

We can fetch the date in the CASE and then use DISTINCT COUNT, which ignores nulls.
I have given a first query to show the intermediate results and the counts with and without DISTINCT to show what it is doing. I have used the test LEFT(log_type,6) = 'Reject' to group the 2 rejection types.
I suggest that it would be a good idea to round the time so that 2 rejections close together will be treated as repetitions. We the current queries event 1 second different will be treated as a different rejection.

create table person(
  id int,
  time date,
  name varchar(20),
  log_type varchar(20));
  insert into person values
(  1,'2022-04-25 07:49:58.0','Brian','Rejection 1'),
(  2,'2022-04-25 07:49:58.0','Brian','Rejection 2'),
(  3,'2022-04-27 13:05:51.0','Fredd','Rejection 1'),
(  4,'2022-05-01 02:13:44.0','Janet','Rejection 1'),
(  5,'2022-05-01 03:45:06.0','Janet','Rejection 2'),
(  6,'2022-05-01 08:01:34.0','Peter','Approval'),
(  7,'2022-05-01 12:12:53.0','Frank','Rejection 2'),
(  8,'2022-05-02 01:26:38.0','Frank','Approval');

✓

8 rows affected

SELECT
  name,
  CASE WHEN LEFT(log_type,6) = 'Reject' THEN time END R,
  CASE WHEN log_type = 'Approval' THEN time END A
FROM person;

name  | r          | a         
:---- | :--------- | :---------
Brian | 2022-04-25 | null      
Brian | 2022-04-25 | null      
Fredd | 2022-04-27 | null      
Janet | 2022-05-01 | null      
Janet | 2022-05-01 | null      
Peter | null       | 2022-05-01
Frank | 2022-05-01 | null      
Frank | null       | 2022-05-02

SELECT
  name,
  COUNT(CASE WHEN LEFT(log_type,6) = 'Reject' THEN time END) all_rejections,
  COUNT(CASE WHEN log_type = 'Approval' THEN time END) all_approvals,
  COUNT(DISTINCT CASE WHEN LEFT(log_type,6) = 'Reject' THEN time END) distinct_rejections,
  COUNT(DISTINCT CASE WHEN log_type = 'Approval' THEN time END) distinct_approvals
FROM person
GROUP BY name;

name  | all_rejections | all_approvals | distinct_rejections | distinct_approvals
:---- | -------------: | ------------: | ------------------: | -----------------:
Brian |              2 |             0 |                   1 |                  0
Frank |              1 |             1 |                   1 |                  1
Fredd |              1 |             0 |                   1 |                  0
Janet |              2 |             0 |                   1 |                  0
Peter |              0 |             1 |                   0 |                  1

db<>fiddle here