I have the following dataset:
individual | clinic_1 | clinic_2 | month | address_recorded | address_code
1 | A | B | 01-01-2016 | 01-02-1999 | C01
1 | A | A | 01-01-2016 | 01-02-2003 | C02
1 | A | A | 01-01-2016 | 01-02-2001 | C06
1 | A | X | 01-01-2016 | 01-02-2000 | C03
2 | C | B | 01-04-2016 | 01-02-1999 | D04
2 | C | A | 01-04-2016 | 01-02-2001 | D05
2 | C | X | 01-04-2016 | 01-02-2000 | D06
I would like to get:
individual | clinic_1 | month | address_code
1 | A | 01-01-2016 | C02
2 | C | 01-04-2016 | D05
Criteria:
- For unique set of individual-clinic_1-month with clinic_1 = clinic_2, select the most recent date in which address was recorded within clinic_1
- For unique set of individual-clinic_1-month with NO instances where clinic_1 = clinic_2, select the most recent date in which address was recorded across clinics
I thought about doing:
with cte_1
as
(
select * from table
where clinic_1 = clinic_2
)
,cte_2
as
(
select row_number () over (Partition by clinic_1, individual, month order by clinic_1, individual, month, address_recorded desc) as number, *
from cte_1
)
select individual, clinic_1, month, address_code from cte_2 where number = 1
But I don't know how to get those individual-clinic_1-month for which there are no instances where clinic_1=clinic_2, any ideas?
CodePudding user response:
You can Union
two select queries; one to select all records where clinic_1=clinic_2
and another one to select all records where clinic_1<>clinic_2
and clinic_1
not in the results set of the first query.
Both queries are grouped by [individual],[clinic_1], [mnth]
to find all of the required data rows for each [clinic_1] - [mnth]
entry.
Check the following:
with cte as
(SELECT [individual] ,[clinic_1],[clinic_2],[mnth],max([address_recorded]) as m
FROM [MyData] where [clinic_1]=[clinic_2]
group by [individual],[clinic_1],[clinic_2] ,[mnth]
),
cte2 as
(SELECT [MyData].[individual] ,[MyData].[clinic_1],'' as [clinic_2],[MyData].[mnth],max([MyData].[address_recorded]) as m
FROM [MyData]
Left Join cte on cte.individual=MyData.individual
and cte.mnth=MyData.mnth
where [MyData].[clinic_1]<>[MyData].[clinic_2] and cte.individual IS NULL
group by [MyData].[individual],[MyData].[clinic_1], [MyData].[mnth]
),
D as
(SELECT * FROM cte
UNION
SELECT * FROM cte2)
, LastQr as(
select [MyData].individual, [MyData].clinic_1,[MyData].mnth,[MyData].address_code from D
INNER JOIN [MyData]
ON D.individual=MyData.individual and D.clinic_1=MyData.clinic_1 and D.mnth=MyData.mnth and D.m=MyData.address_recorded
and D.clinic_2=MyData.clinic_2
UNION
select [MyData].individual, [MyData].clinic_1,[MyData].mnth,[MyData].address_code from D
INNER JOIN [MyData]
ON D.individual=MyData.individual and D.clinic_1=MyData.clinic_1 and D.mnth=MyData.mnth and D.m=MyData.address_recorded
and D.clinic_2='')
,
SelFirstFromDublicate as
(select LastQr.individual, LastQr.clinic_1,LastQr.mnth,LastQr.address_code,
row_number() OVER(PARTITION BY LastQr.individual, LastQr.clinic_1,LastQr.mnth ORDER BY LastQr.individual, LastQr.clinic_1,LastQr.mnth)
as rn from LastQr)
select * from SelFirstFromDublicate where rn=1
See the results from dbfiddle.uk.