Home > Software engineering >  Filtering in sql before joining tables
Filtering in sql before joining tables

Time:08-09

For a study of multiple departments my institution wants a report of every subject who has died in the last 3 months. It includes a column that has the date of the last visit for that subject (whether followup/treatment/whatever).

It's an Oracle database and the SQL code will be run inside JasperReports (Java runs through the website). Given it's pulling from a couple huge reporting views with thousands of rows, it's going incredibly slow if at all.

Views I think I need:

  • subj_access

  • sv_user_pcl_permission

  • sv_user_pcs_access
    These three views are hidden to me but they look at your "user contact id" when you run a report and see what information you should have access to. The top chunk of code they are in is below; everything in the usa CTE has worked.

  • rv_protocol_subject_basic
    Has columns like is_expired, whether they died. Has just one row per subject but has unneccessary rows of people who are alive, i.e. is_expired = 'N'.

  • vw_subject_visits
    Has columns like visit_status, i.e. whether a visit occurred. Has thousands of unnecessary rows, one for every patient visit. I need to filter by each sequence_number, i.e. the patient, and return the row with the most recent visit_date where visit_status was 'occurred'.

I can filter early to only those visits that occurred, and only those subjects who expired, in order to not have to pull in a billion rows, but I can't figure that out.

My code:

WITH usa AS (
    SELECT subj_access.protocol_id, subj_access.protocol_subject_id
    FROM sv_user_pcl_permission priv_check
    JOIN sv_user_pcs_access subj_access ON priv_check.protocol_id = subj_access.protocol_id AND priv_check.contact_id = subj_access.contact_id
    WHERE priv_check.function_name = 'CRPT-Subject Visits'
    AND priv_check.contact_id = '1234'
    -- this is actually a live parameter that will be filled in when a person runs the report, the 1234 will be different for everyone.
)
SELECT usa.protocol_id, pd.protocol_no, sub_p.subject_no, sub_p.subject_mrn, sub_p.sequence_number, sub_p.subject_status, sub_p.is_expired, sub_p.expired_date, pd.visit_status, pd.visit_date
FROM usa
INNER JOIN RV_Protocol_subject_basic sub_p ON sub_p.protocol_id = usa.protocol_id
INNER JOIN vw_subject_visits pd ON pd.protocol_id = usa.protocol_id and pd.sequence_number = sub_p.sequence_number
WHERE sub_p.is_expired = 'Y', pd.visit_status = 'Occurred'
ORDER BY pd.Protocol_no,sub_p.expired_date

I can tell this will return several rows per subject, and I just need one, with their most recent visit that occurred.

CodePudding user response:

From my point of view, a CTE it is - but just because it should contain an analytic function which will tell you which row represents the "last visit" date for each subject. Then, in the main query, return only rows that rank the highest.

As of performance issues: make sure that columns involved in joins and where conditions are properly indexed, if necessary.

WITH usa AS 
  (SELECT subj_access.protocol_id, 
          subj_access.protocol_subject_id,
          pd.protocol_no,
          sub_p.subject_no,
          sub_p.submect_mrn,
          sub_p.sequence_number, 
          sub_p.subject_status, 
          sub_p.is_expired, 
          sub_p.expired_date, 
          pd.visit_status, 
          pd.visit_date,
          --
          row_number() OVER (PARTITION BY sub_p.subject_no ORDER BY pd.visit_date DESC) rn
   FROM sv_user_pcl_permission priv_check
   JOIN sv_user_pcs_access subj_access ON priv_check.protocol_id = subj_access.protocol_id 
                                      AND priv_check.contact_id = subj_access.contact_id
   JOIN rv_protocol_subject_basic sub_p ON sub_p.protocol_id = subj_access.protocol_id
   JOIN vw_subject_visits pd ON pd.protocol_id = subj_access.protocol_id 
                            AND pd.sequence_number = sub_p.sequence_number
   WHERE priv_check.function_name = 'CRPT-Subject Visits'
     AND priv_check.contact_id = '1234' -- this is actually ...
     AND sub_p.is_expired = 'Y'
     AND pd.visit_status = 'Occurred'
  )
SELECT * 
FROM usa
WHERE rn = 1
ORDER BY protocol_no, expired_date
  • Related