Currently have a dataset of 'visits' with the day, patient, provider, the rank of the appointment (sequential number of that appointment for that patient) and rank of the appointment/provider (sequential number of that appointment for that patient/provider)
Need to use this to do a lookback on the first of each month (1/1/2020 start), for each patient look at the prior 7 visits (with no limit on how far this is) and grab the provider that had the majority of those visits to associate with that patient for that month.
If the patient only had less than 7 visits, still apply this logic. If the patient has an even number of visits that leads to a tie of these providers, the winner will be the most recent one. Patients may not have any appointments for that month period.
So for example patient 'A' the base data is like this...
ApptDate | Patient | Provider | ApptRank | ApptProvRank |
1/15/2020| A | BOB | 1 | 1 |
3/01/2020| A | BOB | 2 | 2 |
3/08/2020| A | BOB | 3 | 3 |
3/20/2020| A | BOB | 4 | 4 |
4/01/2020| A | BOB | 5 | 5 |
4/15/2020| A | SMITH | 6 | 1 |
6/07/2020| A | SMITH | 7 | 2 |
6/21/2020| A | BOB | 8 | 6 |
7/01/2020| A | JANE | 9 | 1 |
7/15/2020| A | JANE | 10 | 2 |
7/21/2020| A | JANE | 11 | 3 |
8/01/2020| A | JANE | 12 | 4 |
8/15/2020| A | JANE | 13 | 5 |
9/20/2020| A | JANE | 14 | 6 |
I then have a process that fills in missing months and counts the number of appts within that month for each provider and that looks like this...
Month | Patient | Provider | Appts |Cumulative Appt Rank | Cumulative Appt Provider Rank
1/1/2020 | A | BOB | 1 | 1 | 1
2/1/2020 | A | NULL | NULL | NULL | NULL
3/1/2020 | A | BOB | 3 | 4 | 4
4/1/2020 | A | BOB | 1 | 5 | 5
4/1/2020 | A | SMITH | 1 | 6 | 1
5/1/2020 | A | NULL | NULL | NULL | NULL
6/1/2020 | A | SMITH | 1 | 7 | 2
6/1/2020 | A | BOB | 1 | 8 | 6
7/1/2020 | A | JANE | 3 | 11 | 3
8/1/2020 | A | JANE | 2 | 13 | 5
9/1/2020 | A | JANE | 2 | 14 | 6
What I want is something that looks like this: (ONE month ONE provider and the share of Appts out of the 7 prior - which should never be higher than 7)
Month | Patient | Provider | Appt Count (of Past 7)
2/1/2020 | A | BOB | 1 | ----\> starting 2/1/2020 for the 'lookback'
3/1/2020 | A | BOB | 1 |
4/1/2020 | A | BOB | 4 |
5/1/2020 | A | BOB | 5 |
6/1/2020 | A | BOB | 5 |
7/1/2020 | A | BOB | 5 |
8/1/2020 | A | JANE | 3 |
9/1/2020 | A | JANE | 5 |
10/1/2020| A | JANE | 6 |
This would involve multiple patients, but to simplify for this example it's just one. Greatly appreciate any help on this one
CodePudding user response:
After converting your example data into DDL/DML:
DECLARE @Visits TABLE (ID INT IDENTITY, ApptDate DATETIME, Patient NVARCHAR(10), Provider NVARCHAR(10), ApptRank INT, ApptProvRank INT);
INSERT INTO @Visits (ApptDate, Patient, Provider, ApptRank, ApptProvRank) VALUES
('1/15/2020', 'A', 'BOB ', 1 , 1),
('3/01/2020', 'A', 'BOB ', 2 , 2),
('3/08/2020', 'A', 'BOB ', 3 , 3),
('3/20/2020', 'A', 'BOB ', 4 , 4),
('4/01/2020', 'A', 'BOB ', 5 , 5),
('4/15/2020', 'A', 'SMITH ', 6 , 1),
('6/07/2020', 'A', 'SMITH ', 7 , 2),
('6/21/2020', 'A', 'BOB ', 8 , 6),
('7/01/2020', 'A', 'JANE ', 9 , 1),
('7/15/2020', 'A', 'JANE ', 10, 2),
('7/21/2020', 'A', 'JANE ', 11, 3),
('8/01/2020', 'A', 'JANE ', 12, 4),
('8/15/2020', 'A', 'JANE ', 13, 5),
('9/20/2020', 'A', 'JANE ', 14, 6),
('1/15/2021', 'B', 'BOB ', 1 , 1),
('3/01/2021', 'B', 'BOB ', 2 , 2),
('3/08/2021', 'B', 'BOB ', 3 , 3),
('3/20/2021', 'B', 'BOB ', 4 , 4),
('4/01/2021', 'B', 'BOB ', 5 , 5),
('4/15/2021', 'B', 'SMITH ', 6 , 1),
('6/07/2021', 'B', 'SMITH ', 7 , 2),
('6/21/2021', 'B', 'JANE ', 8 , 6),
('7/01/2021', 'B', 'JANE ', 9 , 1),
('7/15/2021', 'B', 'JANE ', 10, 2),
('7/21/2021', 'B', 'JANE ', 11, 3),
('8/01/2021', 'B', 'JANE ', 12, 4),
('8/15/2021', 'B', 'JANE ', 13, 5),
('9/20/2021', 'B', 'JANE ', 14, 6);
We can look at this data via windowed functions to determine the most popular provider for a patient over the last 7 visits (regardless of duration) and then join back to it to see the most popular provider for the 7 visit window for each appointment.
;WITH top7VisitProvider AS (
SELECT patient, a.ApptRank, a.v7Provider, a.ProviderLast7Cnt, ROW_NUMBER() OVER (PARTITION BY a.Patient, a.ApptRank ORDER BY a.ProviderLast7Cnt DESC) AS rn
FROM (
SELECT v.ID, v.ApptDate, v.Patient, v.Provider, v.ApptRank, v.ApptProvRank, vr.Provider AS v7Provider, COUNT(*) AS ProviderLast7Cnt
FROM @Visits v
LEFT OUTER JOIN @Visits vr
ON v.Patient = vr.Patient
AND vr.ApptRank BETWEEN v.ApptRank-7 AND v.ApptRank
GROUP BY v.ID, v.ApptDate, v.Patient, v.Provider, v.ApptRank, v.ApptProvRank, vr.Provider
) a
)
SELECT v.ID, v.ApptDate, v.Patient, v.Provider, v.ApptRank, v.ApptProvRank, t7vp.v7Provider
FROM @Visits v
INNER JOIN top7VisitProvider t7vp
ON v.Patient = t7vp.Patient
AND v.ApptRank = t7vp.ApptRank
AND t7vp.rn = 1
ID ApptDate Patient Provider ApptRank ApptProvRank v7Provider
--------------------------------------------------------------------------------------
1 2020-01-15 00:00:00.000 A BOB 1 1 BOB
2 2020-03-01 00:00:00.000 A BOB 2 2 BOB
3 2020-03-08 00:00:00.000 A BOB 3 3 BOB
4 2020-03-20 00:00:00.000 A BOB 4 4 BOB
5 2020-04-01 00:00:00.000 A BOB 5 5 BOB
6 2020-04-15 00:00:00.000 A SMITH 6 1 BOB
7 2020-06-07 00:00:00.000 A SMITH 7 2 BOB
8 2020-06-21 00:00:00.000 A BOB 8 6 BOB
9 2020-07-01 00:00:00.000 A JANE 9 1 BOB
10 2020-07-15 00:00:00.000 A JANE 10 2 BOB
11 2020-07-21 00:00:00.000 A JANE 11 3 BOB
12 2020-08-01 00:00:00.000 A JANE 12 4 JANE
13 2020-08-15 00:00:00.000 A JANE 13 5 JANE
14 2020-09-20 00:00:00.000 A JANE 14 6 JANE
15 2021-01-15 00:00:00.000 B BOB 1 1 BOB
16 2021-03-01 00:00:00.000 B BOB 2 2 BOB
17 2021-03-08 00:00:00.000 B BOB 3 3 BOB
18 2021-03-20 00:00:00.000 B BOB 4 4 BOB
19 2021-04-01 00:00:00.000 B BOB 5 5 BOB
20 2021-04-15 00:00:00.000 B SMITH 6 1 BOB
21 2021-06-07 00:00:00.000 B SMITH 7 2 BOB
22 2021-06-21 00:00:00.000 B JANE 8 6 BOB
23 2021-07-01 00:00:00.000 B JANE 9 1 BOB
24 2021-07-15 00:00:00.000 B JANE 10 2 BOB
25 2021-07-21 00:00:00.000 B JANE 11 3 JANE
26 2021-08-01 00:00:00.000 B JANE 12 4 JANE
27 2021-08-15 00:00:00.000 B JANE 13 5 JANE
28 2021-09-20 00:00:00.000 B JANE 14 6 JANE
CodePudding user response:
Maybe this script can help as a first step for all those colleagues that are interest to analyze this case:
CREATE table bu_datalake.PatientProvider
(
ApptDate Date,
Patient String,
Provider String
);
INSERT INTO bu_datalake.PatientProvider
VALUES
('2022-01-15','A','BOB'),
('2022-03-01','A','BOB'),
('2022-03-08','A','BOB'),
('2022-03-20','A','BOB'),
('2022-04-01','A','BOB'),
('2022-04-15','A','SMITH'),
('2022-06-07','A','SMITH'),
('2022-06-21','A','BOB'),
('2022-07-01','A','JANE'),
('2022-07-15','A','JANE'),
('2022-07-21','A','JANE'),
('2022-08-01','A','JANE'),
('2022-08-15','A','JANE'),
('2022-09-20','A','JANE')