Calculating slots with double bookings and null val-CodePudding

Example dataset.

CLINIC	APPTDATETIME	PATIENT_ID	NEW_FOLLOWUP_FLAG
TGYN	20/07/2022 09:00:00	1	N
TGYN	20/07/2022 09:45:00	2	F
TGYN	20/07/2022 10:05:00	NULL	NULL
TGYN	20/07/2022 10:05:00	4	F
TGYN	20/07/2022 10:25:00	5	F
TGYN	20/07/2022 10:30:00	NULL	NULL
TGYN	20/07/2022 10:35:00	NULL	NULL
TGYN	20/07/2022 10:40:00	NULL	NULL
TGYN	20/07/2022 10:45:00	NULL	NULL
TGYN	20/07/2022 11:10:00	6	F
TGYN	20/07/2022 11:10:00	7	F

As you can see there are times with multiple patients, times with empty slots and times with both (generally DQ errors).

I'm trying to calculate how many slots where filled and how many of those were new (N) or follow up(F). If there is a slot with a patient and also a NULL row then I only want to count the row with the patient. If there are only NULL rows for a timeslot then I want to count that as 'unfilled'.

From this dataset I would like to calculate the following for each group of clinic and apptdatetime.

CLINIC	APPTDATE	N Capacity	F Capacity	Unfilled Capacity
TGYN	20/07/2022	1	5	4

What's the best way to go about this?

I've considered taking a list of distinct values for each clinic and date and then joining to that but wanted to know if there are a more elegant way.

CodePudding user response：

First I set up some demo data in a table from what you provided:

DECLARE @table TABLE (CLINIC NVARCHAR(4), APPTDATETIME DATETIME, PATIENT_ID INT, NEW_FOLLOWUP_FLAG NVARCHAR(1))
INSERT INTO @table (CLINIC, APPTDATETIME, PATIENT_ID, NEW_FOLLOWUP_FLAG) VALUES
('TGYN','07/20/2022 09:00:00', 1    ,'N'),
('TGYN','07/20/2022 09:45:00', 2    ,'F'),
('TGYN','07/20/2022 10:05:00', NULL ,NULL),
('TGYN','07/20/2022 10:05:00', 4    ,'F'),
('TGYN','07/20/2022 10:25:00', 5    ,'F'),
('TGYN','07/20/2022 10:30:00', NULL ,NULL),
('TGYN','07/20/2022 10:35:00', NULL ,NULL),
('TGYN','07/20/2022 10:40:00', NULL ,NULL),
('TGYN','07/20/2022 10:45:00', NULL ,NULL),
('TGYN','07/20/2022 11:10:00', 6    ,'F'),
('TGYN','07/20/2022 11:10:00', 7    ,'F')

Reading through your description it looks like you'd need a couple of case statements and a group by:

SELECT CLINIC, CAST(APPTDATETIME AS DATE) AS APPTDATE, 
       SUM(CASE WHEN NEW_FOLLOWUP_FLAG = 'N' THEN 1 ELSE 0 END) AS NCapacity, 
       SUM(CASE WHEN NEW_FOLLOWUP_FLAG = 'F' THEN 1 ELSE 0 END) AS FCapacity, 
       SUM(CASE WHEN NEW_FOLLOWUP_FLAG IS NULL THEN 1 ELSE 0 END) AS UnfilledCapacity
  FROM @table
 GROUP BY CLINIC, CAST(APPTDATETIME AS DATE)

Which returns a result set like this:

CLINIC  APPTDATE    NCapacity   FCapacity   UnfilledCapacity
------------------------------------------------------------
TGYN    2022-07-20  1           5           5

Note that I cast the datetime column to a date and grouped by that. The case statements just test for a condition (is the column null, or F or N) and then just returns a 1, which is summed.

Your title also asked about finding duplicates in the data set. You should likely have a constraint on this table making CLINIC and APPTDATETIME forcibly unique. This would prevent rows even being inserted as dupes.

If you want to find them in the table try something like this:

SELECT CLINIC, APPTDATETIME, COUNT(*) AS Cnt
  FROM @table
 GROUP BY CLINIC, APPTDATETIME
 HAVING COUNT(*) > 1

Which from the test data returned:

CLINIC  APPTDATETIME            Cnt
-----------------------------------
TGYN    2022-07-20 10:05:00.000 2
TGYN    2022-07-20 11:10:00.000 2

Indicating there are dupes for those clinic/datetime combinations.

HAVING is the magic here, we can count them up and state we only want ones which are greater than 1.

CodePudding user response：

This is basically a straight-forward conditional aggregation with group by, with the slight complication of excluding NULL rows where a corresponding appointment also exists.

For this you can include an anti-semi self-join using not exists so as to exclude counting for unfilled capacity any row where there's also valid data for the same date:

select CLINIC, Convert(date, APPTDATETIME) AppDate, 
    Sum(case when NEW_FOLLOWUP_FLAG = 'N' then 1 end) N_Capacity,
    Sum(case when NEW_FOLLOWUP_FLAG = 'f' then 1 end) F_Capacity,
    Sum(case when NEW_FOLLOWUP_FLAG is null then 1 end) U_Capacity
from t
where not exists (
  select * from t t2
  where t.PATIENT_ID is null 
    and t2.PATIENT_ID is not null 
    and t.APPTDATETIME = t2.APPTDATETIME
)
group by CLINIC, Convert(date, APPTDATETIME);