For a study of multiple departments my institution wants a report of every subject who has died in the last 3 months. It includes a column that has the date of the last visit for that subject (whether followup/treatment/whatever).
It's an Oracle database and the SQL code will be run inside JasperReports (Java runs through the website). Given it's pulling from a couple huge reporting views with thousands of rows, it's going incredibly slow if at all.
Views I think I need:
subj_access
sv_user_pcl_permission
sv_user_pcs_access
These three views are hidden to me but they look at your "user contact id" when you run a report and see what information you should have access to. The top chunk of code they are in is below; everything in theusa
CTE has worked.rv_protocol_subject_basic
Has columns likeis_expired
, whether they died. Has just one row per subject but has unneccessary rows of people who are alive, i.e.is_expired = 'N'
.vw_subject_visits
Has columns likevisit_status
, i.e. whether a visit occurred. Has thousands of unnecessary rows, one for every patient visit. I need to filter by eachsequence_number
, i.e. the patient, and return the row with the most recentvisit_date
wherevisit_status
was'occurred'
.
I can filter early to only those visits that occurred, and only those subjects who expired, in order to not have to pull in a billion rows, but I can't figure that out.
My code:
WITH usa AS (
SELECT subj_access.protocol_id, subj_access.protocol_subject_id
FROM sv_user_pcl_permission priv_check
JOIN sv_user_pcs_access subj_access ON priv_check.protocol_id = subj_access.protocol_id AND priv_check.contact_id = subj_access.contact_id
WHERE priv_check.function_name = 'CRPT-Subject Visits'
AND priv_check.contact_id = '1234'
-- this is actually a live parameter that will be filled in when a person runs the report, the 1234 will be different for everyone.
)
SELECT usa.protocol_id, pd.protocol_no, sub_p.subject_no, sub_p.subject_mrn, sub_p.sequence_number, sub_p.subject_status, sub_p.is_expired, sub_p.expired_date, pd.visit_status, pd.visit_date
FROM usa
INNER JOIN RV_Protocol_subject_basic sub_p ON sub_p.protocol_id = usa.protocol_id
INNER JOIN vw_subject_visits pd ON pd.protocol_id = usa.protocol_id and pd.sequence_number = sub_p.sequence_number
WHERE sub_p.is_expired = 'Y', pd.visit_status = 'Occurred'
ORDER BY pd.Protocol_no,sub_p.expired_date
I can tell this will return several rows per subject, and I just need one, with their most recent visit that occurred.
CodePudding user response:
From my point of view, a CTE it is - but just because it should contain an analytic function which will tell you which row represents the "last visit" date for each subject. Then, in the main query, return only rows that rank the highest.
As of performance issues: make sure that columns involved in joins and where
conditions are properly indexed, if necessary.
WITH usa AS
(SELECT subj_access.protocol_id,
subj_access.protocol_subject_id,
pd.protocol_no,
sub_p.subject_no,
sub_p.submect_mrn,
sub_p.sequence_number,
sub_p.subject_status,
sub_p.is_expired,
sub_p.expired_date,
pd.visit_status,
pd.visit_date,
--
row_number() OVER (PARTITION BY sub_p.subject_no ORDER BY pd.visit_date DESC) rn
FROM sv_user_pcl_permission priv_check
JOIN sv_user_pcs_access subj_access ON priv_check.protocol_id = subj_access.protocol_id
AND priv_check.contact_id = subj_access.contact_id
JOIN rv_protocol_subject_basic sub_p ON sub_p.protocol_id = subj_access.protocol_id
JOIN vw_subject_visits pd ON pd.protocol_id = subj_access.protocol_id
AND pd.sequence_number = sub_p.sequence_number
WHERE priv_check.function_name = 'CRPT-Subject Visits'
AND priv_check.contact_id = '1234' -- this is actually ...
AND sub_p.is_expired = 'Y'
AND pd.visit_status = 'Occurred'
)
SELECT *
FROM usa
WHERE rn = 1
ORDER BY protocol_no, expired_date