Example dataset.
CLINIC | APPTDATETIME | PATIENT_ID | NEW_FOLLOWUP_FLAG |
---|---|---|---|
TGYN | 20/07/2022 09:00:00 | 1 | N |
TGYN | 20/07/2022 09:45:00 | 2 | F |
TGYN | 20/07/2022 10:05:00 | NULL | NULL |
TGYN | 20/07/2022 10:05:00 | 4 | F |
TGYN | 20/07/2022 10:25:00 | 5 | F |
TGYN | 20/07/2022 10:30:00 | NULL | NULL |
TGYN | 20/07/2022 10:35:00 | NULL | NULL |
TGYN | 20/07/2022 10:40:00 | NULL | NULL |
TGYN | 20/07/2022 10:45:00 | NULL | NULL |
TGYN | 20/07/2022 11:10:00 | 6 | F |
TGYN | 20/07/2022 11:10:00 | 7 | F |
As you can see there are times with multiple patients, times with empty slots and times with both (generally DQ errors).
I'm trying to calculate how many slots where filled and how many of those were new (N) or follow up(F). If there is a slot with a patient and also a NULL row then I only want to count the row with the patient. If there are only NULL rows for a timeslot then I want to count that as 'unfilled'.
From this dataset I would like to calculate the following for each group of clinic and apptdatetime.
CLINIC | APPTDATE | N Capacity | F Capacity | Unfilled Capacity |
---|---|---|---|---|
TGYN | 20/07/2022 | 1 | 5 | 4 |
What's the best way to go about this?
I've considered taking a list of distinct values for each clinic and date and then joining to that but wanted to know if there are a more elegant way.
CodePudding user response:
First I set up some demo data in a table from what you provided:
DECLARE @table TABLE (CLINIC NVARCHAR(4), APPTDATETIME DATETIME, PATIENT_ID INT, NEW_FOLLOWUP_FLAG NVARCHAR(1))
INSERT INTO @table (CLINIC, APPTDATETIME, PATIENT_ID, NEW_FOLLOWUP_FLAG) VALUES
('TGYN','07/20/2022 09:00:00', 1 ,'N'),
('TGYN','07/20/2022 09:45:00', 2 ,'F'),
('TGYN','07/20/2022 10:05:00', NULL ,NULL),
('TGYN','07/20/2022 10:05:00', 4 ,'F'),
('TGYN','07/20/2022 10:25:00', 5 ,'F'),
('TGYN','07/20/2022 10:30:00', NULL ,NULL),
('TGYN','07/20/2022 10:35:00', NULL ,NULL),
('TGYN','07/20/2022 10:40:00', NULL ,NULL),
('TGYN','07/20/2022 10:45:00', NULL ,NULL),
('TGYN','07/20/2022 11:10:00', 6 ,'F'),
('TGYN','07/20/2022 11:10:00', 7 ,'F')
Reading through your description it looks like you'd need a couple of case statements and a group by:
SELECT CLINIC, CAST(APPTDATETIME AS DATE) AS APPTDATE,
SUM(CASE WHEN NEW_FOLLOWUP_FLAG = 'N' THEN 1 ELSE 0 END) AS NCapacity,
SUM(CASE WHEN NEW_FOLLOWUP_FLAG = 'F' THEN 1 ELSE 0 END) AS FCapacity,
SUM(CASE WHEN NEW_FOLLOWUP_FLAG IS NULL THEN 1 ELSE 0 END) AS UnfilledCapacity
FROM @table
GROUP BY CLINIC, CAST(APPTDATETIME AS DATE)
Which returns a result set like this:
CLINIC APPTDATE NCapacity FCapacity UnfilledCapacity
------------------------------------------------------------
TGYN 2022-07-20 1 5 5
Note that I cast the datetime column to a date and grouped by that. The case statements just test for a condition (is the column null, or F or N) and then just returns a 1, which is summed.
Your title also asked about finding duplicates in the data set. You should likely have a constraint on this table making CLINIC and APPTDATETIME forcibly unique. This would prevent rows even being inserted as dupes.
If you want to find them in the table try something like this:
SELECT CLINIC, APPTDATETIME, COUNT(*) AS Cnt
FROM @table
GROUP BY CLINIC, APPTDATETIME
HAVING COUNT(*) > 1
Which from the test data returned:
CLINIC APPTDATETIME Cnt
-----------------------------------
TGYN 2022-07-20 10:05:00.000 2
TGYN 2022-07-20 11:10:00.000 2
Indicating there are dupes for those clinic/datetime combinations.
HAVING is the magic here, we can count them up and state we only want ones which are greater than 1.
CodePudding user response:
This is basically a straight-forward conditional aggregation with group by, with the slight complication of excluding NULL rows where a corresponding appointment also exists.
For this you can include an anti-semi self-join using not exists so as to exclude counting for unfilled capacity any row where there's also valid data for the same date:
select CLINIC, Convert(date, APPTDATETIME) AppDate,
Sum(case when NEW_FOLLOWUP_FLAG = 'N' then 1 end) N_Capacity,
Sum(case when NEW_FOLLOWUP_FLAG = 'f' then 1 end) F_Capacity,
Sum(case when NEW_FOLLOWUP_FLAG is null then 1 end) U_Capacity
from t
where not exists (
select * from t t2
where t.PATIENT_ID is null
and t2.PATIENT_ID is not null
and t.APPTDATETIME = t2.APPTDATETIME
)
group by CLINIC, Convert(date, APPTDATETIME);