I have an occupation list which will have different versions of the occupation record added over time. This means that there will be multiple records with the same OCCUPATION_CD
and I want to filter my data set to only show the latest version of each occupation by MAJOR_VER_DATE
.
I will be looking up the unique OCCUPATION_ID
for the latest version of an occupation so that I can then lookup all child records associated to that version of the occupation.
Occupations data set
OCCUPATION_ID,OCCUPATION_CD,ANZSIC_ID,OCCUPATION_DIVISION_CODE,OCCUPATION_SUBDIVISION_CODE,OCCUPATION_CLASS,OCCUPATION_CD_DSC,MAJOR_VER,MAJOR_VER_DATE,MINOR_VER,MINOR_VER_DATE
1,0219D,A0219,A,"A_Agriculture, Forestry and Fishing",,Agricultural / Farm Fencing Contractors,1,12/06/2022,1,12/06/2022
2,5110C,G5110,G,G_Food Retailing,R,Tobacconists,1,12/06/2022,1,12/06/2022
3,5710D,H5710,H,H_Accomodation & Licensed Clubs,,Motel,1,12/06/2022,1,12/06/2022
4,8440A,N8440,N,N_Education,,Other Education not elsewhere classified (Theoretical training only),1,12/06/2022,1,12/06/2022
5,9523,Q9523,Q,Q_Other,,Photographic Studio,1,12/06/2022,1,12/06/2022
6,9524D,Q9524,Q,Q_Other,,Cemetery Operation,1,12/06/2022,1,12/06/2022
7,9525A,Q9525,Q,Q_Other,,Gardening Services,1,12/06/2022,1,12/06/2022
8,9525C,Q9525,Q,Q_Other,,Swimming Pool Cleaning & Maintenance,1,12/06/2022,1,12/06/2022
9,9529B,Q9529,Q,Q_Other,,Pet Grooming / Washing Service,1,12/06/2022,1,12/06/2022
10,9529H,Q9529,Q,Q_Other,,Chauffeur Service,1,12/06/2022,1,12/06/2022
11,7241,,J,,,Music Publishing,1,12/06/2022,1,12/06/2022
12,0219D,A0219,A,"A_Agriculture, Forestry and Fishing",,Fence Installation,2,11/07/2022,1,11/07/2022
What I am hoping is the outcome to be is that if I do a
SELECT OCCUPATION_ID FROM [data_set] WHERE OCCUPATION_CD = '0219D'
will resolve OCCUPATION_ID 12
not 1
because the record with ID 12 has a MAJOR_VER_DATE which is more recent than the record with ID 1.
Where I've got to is attempting to filter down the data_set I want to only show the latest version is below. I elected to use the unique OCCUPATION_ID
as my way of getting the latest version instead of the MAJOR_VER_DATE
because it feels more reliable.
What I then need to do make the results of this be the table that I'm querying on the OCCUPATION_CD
to resolve the OCCUPATION_ID
. How would you do this? It's like a three layered query, I can't get my head around it. Thanks
SELECT latest_versions.*
FROM `OCCUPATIONS` latest_versions JOIN
(
SELECT OCCUPATION_CD, MAX(OCCUPATION_ID) as latest_version
FROM OCCUPATIONS
GROUP BY OCCUPATION_CD
) occ_filtered
ON latest_versions.OCCUPATION_CD = occ_filtered.OCCUPATION_CD
AND OCCUPATION_ID = latest_version
CodePudding user response:
SELECT * from
occupations A JOIN (
SELECT occupation_cd, max(major_ver_date) max_date
FROM occupations B
GROUP by occupation_cd
) C on a.ocupation_cd=c.occupation_cd and a.major_ver_date=c.max_date
WHERE A.occupation_id = (
SELECT top 1 occupation_id
FROM occupations D
WHERE D.occupation_cd=A.occupation_cd and D.major_ver_date=A.major_ver_date
ORDER BY 1 DESC
)
Specific sql implementation could allow to much simpler query.
CodePudding user response:
with data as (
select *, row_number()
over (partition by occupation_cd order by major_ver_date desc) as rn
from occupations
)
select * from data where rn = 1;
Or following your line of approach:
SELECT * FROM OCCUPATIONS WHERE OCCUPATION_ID IN (
SELECT OCCUPATION_ID
FROM `OCCUPATIONS` latest_versions JOIN (
SELECT OCCUPATION_CD, MAX(MAJOR_VERSION_DATE) as latest_version
FROM OCCUPATIONS
GROUP BY OCCUPATION_CD
) occ_filtered
ON latest_versions.OCCUPATION_CD = occ_filtered.OCCUPATION_CD
AND MAJOR_VERSION_DATE = latest_version
)