I'm working in PostgreSQL. Suppose I have this Person
table:
| id | time | name | type |
----------------------------------------------------
| 1 | 2022-04-25 07:49:58.0 | Brian | Rejection 1 |
| 2 | 2022-04-25 07:49:58.0 | Brian | Rejection 2 |
| 3 | 2022-04-27 13:05:51.0 | Fredd | Rejection 1 |
| 4 | 2022-05-01 02:13:44.0 | Janet | Rejection 1 |
| 5 | 2022-05-01 03:45:06.0 | Janet | Rejection 2 |
| 6 | 2022-05-01 08:01:34.0 | Peter | Approval |
| 7 | 2022-05-01 12:12:53.0 | Frank | Rejection 2 |
| 8 | 2022-05-02 01:26:38.0 | Frank | Approval |
Note: We have 2 rejections types Rejection 1
and Rejection 2
.
I would like to make a query that counts the number of Rejections and the number of approvals for each name. However if there are 2 rejections at the same time, for the same name, like the two first rows in the example, it should only count as one.
Let me just add that it's possible for there to be one of each rejection types at the same time for the same name, but it's impossible for there to be two rejections of the same type at the same time for the same name.
So this is what I'm expecting it to return:
| name | approvals | rejections |
----------------------------------
| Brian | 0 | 1 |
| Fredd | 0 | 1 |
| Janet | 0 | 2 |
| Peter | 1 | 0 |
| Frank | 1 | 1 |
The closest I could get to this is the following:
SELECT
name,
COALESCE(SUM(CASE WHEN log_type = 'Approval' THEN 1 ELSE 0 END), 0) approvals,
COALESCE(SUM(CASE WHEN log_type = 'Rejection 1' OR log_type = 'Rejection 2' THEN 1 ELSE 0 END), 0) rejections
FROM
person
GROUP BY
name
The problem with this is that it counts two rejections with same time and name as 2 instead of 1.
CodePudding user response:
Use ROW_NUMBER
to remove duplicates, then use a simple count query to find the counts:
SELECT
name,
COUNT(*) FILTER (WHERE log_type = 'Approval') approvals,
COUNT(*) FILTER (WHERE log_type LIKE 'Rejection%') rejections
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY time, name, SUBSTRING(log_type FROM '\w ')) rn
FROM person
) t
WHERE rn = 1
GROUP BY name;
CodePudding user response:
You can use DISTINCT
inside COUNT()
to count the distinct time
s if log_type
is the 'Rejection X'
:
SELECT name,
COUNT(CASE WHEN log_type = 'Approval' THEN 1 END) approvals,
COUNT(DISTINCT CASE WHEN log_type IN ('Rejection 1', 'Rejection 2') THEN time END) rejections
FROM person
GROUP BY name;
See the demo.
CodePudding user response:
We can fetch the date in the CASE and then use DISTINCT COUNT, which ignores nulls.
I have given a first query to show the intermediate results and the counts with and without DISTINCT to show what it is doing. I have used the test LEFT(log_type,6) = 'Reject'
to group the 2 rejection types.
I suggest that it would be a good idea to round the time so that 2 rejections close together will be treated as repetitions. We the current queries event 1 second different will be treated as a different rejection.
create table person( id int, time date, name varchar(20), log_type varchar(20)); insert into person values ( 1,'2022-04-25 07:49:58.0','Brian','Rejection 1'), ( 2,'2022-04-25 07:49:58.0','Brian','Rejection 2'), ( 3,'2022-04-27 13:05:51.0','Fredd','Rejection 1'), ( 4,'2022-05-01 02:13:44.0','Janet','Rejection 1'), ( 5,'2022-05-01 03:45:06.0','Janet','Rejection 2'), ( 6,'2022-05-01 08:01:34.0','Peter','Approval'), ( 7,'2022-05-01 12:12:53.0','Frank','Rejection 2'), ( 8,'2022-05-02 01:26:38.0','Frank','Approval');
✓
8 rows affected
SELECT name, CASE WHEN LEFT(log_type,6) = 'Reject' THEN time END R, CASE WHEN log_type = 'Approval' THEN time END A FROM person;
name | r | a :---- | :--------- | :--------- Brian | 2022-04-25 | null Brian | 2022-04-25 | null Fredd | 2022-04-27 | null Janet | 2022-05-01 | null Janet | 2022-05-01 | null Peter | null | 2022-05-01 Frank | 2022-05-01 | null Frank | null | 2022-05-02
SELECT name, COUNT(CASE WHEN LEFT(log_type,6) = 'Reject' THEN time END) all_rejections, COUNT(CASE WHEN log_type = 'Approval' THEN time END) all_approvals, COUNT(DISTINCT CASE WHEN LEFT(log_type,6) = 'Reject' THEN time END) distinct_rejections, COUNT(DISTINCT CASE WHEN log_type = 'Approval' THEN time END) distinct_approvals FROM person GROUP BY name;
name | all_rejections | all_approvals | distinct_rejections | distinct_approvals :---- | -------------: | ------------: | ------------------: | -----------------: Brian | 2 | 0 | 1 | 0 Frank | 1 | 1 | 1 | 1 Fredd | 1 | 0 | 1 | 0 Janet | 2 | 0 | 1 | 0 Peter | 0 | 1 | 0 | 1
db<>fiddle here