Home > Mobile >  Improve speed of job-applicant SQL query with a bunch of left joins
Improve speed of job-applicant SQL query with a bunch of left joins

Time:06-02

Below is a close approximation of a Postgres query that takes about 3 seconds to complete. I'd love to improve the performance. (I've changed the table names and a couple of the values to protect some information.)

The tables seem to be appropriately indexed. I suspect the left joins are the killer here. Would it be worthwhile to write the query the other way around, where we start by finding the SSNs and then work our way up to the job application? Or are there any other ways to speed things up here?

-- Determine whether the same job applicant has applied (and been rejected)
-- for a job at the same retailer at some point in the past.  If so,
-- return the IDs of the rejected job application(s).

-- job_application table has ~2 million rows
-- applicant_details table has ~1.6 million rows
-- applicant_profile table has ~1.4 million rows
-- applicant_ssn table has ~1.4 million rows

-- It is possible for job applicants to have two profiles: a primary profile
-- and, optionally, a secondary profile.  Thus, we need to look up details
-- for both profiles.

SELECT job_application.id
FROM   job_application
       INNER JOIN applicant_details
            ON ( job_application_id.id = applicant_details.job_application_id )
       LEFT OUTER JOIN applicant_profile
            ON ( applicant_details.primary_profile_id = applicant_profile.id )
       LEFT OUTER JOIN applicant_ssn
            ON ( applicant_profile.id = applicant_ssn.profile_id )
       LEFT OUTER JOIN applicant_profile T6
            ON ( applicant_details.secondary_profile_id = T6.id )
       LEFT OUTER JOIN applicant_ssn T7
            ON ( T6.id = T7.profile_id )
WHERE  ( job_application.client_id = 1023
         AND job_application.status = 'rejected'
         AND ( applicant_ssn.number = '[encrypted value]'
                OR T7.number = '[encrypted value]' )
)
ORDER  BY job_application.id DESC 

Update: Here's the lightly redacted EXPLAIN:

Nested Loop Left Join  (cost=2.57..6630.95 rows=1 width=4)
  Filter: ((ssn.number = '[encrypted value]'::text) OR (t7.number = '[encrypted value]'::text))
  ->  Nested Loop Left Join  (cost=2.14..6526.78 rows=208 width=19)
        ->  Nested Loop Left Join  (cost=1.71..6426.63 rows=202 width=12)
              ->  Nested Loop Left Join  (cost=1.28..6333.56 rows=202 width=12)
                    ->  Nested Loop  (cost=0.85..6240.49 rows=202 width=12)
                          ->  Index Scan using job_application_client_id on job_application ja  (cost=0.43..4298.93 rows=232 width=4)
                                Index Cond: (client_id = 97)
                                Filter: ((id <> 8936) AND ((status)::text = 'rejected'::text))
                          ->  Index Scan using applicant_details_6bc0a4eb on applicant_details ad  (cost=0.43..8.36 rows=1 width=12)
                                Index Cond: (application_id = ja.id)
                    ->  Index Only Scan using applicant_profile_pkey on applicant_profile ap  (cost=0.43..0.46 rows=1 width=4)
                          Index Cond: (id = ad.primary_profile_id)
              ->  Index Only Scan using applicant_profile_pkey on applicant_profile t6  (cost=0.43..0.46 rows=1 width=4)
                    Index Cond: (id = ad.coapplicant_profile_id)
        ->  Index Scan using applicant_ssn_83a0eb3f on applicant_ssn ssn  (cost=0.43..0.49 rows=1 width=15)
              Index Cond: (ap.id = profile_id)
  ->  Index Scan using applicant_ssn_83a0eb3f on applicant_ssn t7  (cost=0.43..0.49 rows=1 width=15)
        Index Cond: (t6.id = profile_id)

CodePudding user response:

Basically, you should start from the SSN, and:

SELECT profile_id FROM applicant_ssn WHERE  number = %s
UNION
SELECT the id going through the secondary profile

Now you have one or two profile ids for this applicant. Simply use the above as the first table in jour join, then join that to job_applications.

The whole query shouldn't take more than a couple hundred microseconds.

CodePudding user response:

Remove the tables you don't SELECT from from the main query, push the two OR legs and their tables into two EXISTS() subqueries (untested!)


SELECT ja.id
FROM   job_application ja
        -- (assuming an 1-1 relation here ...)
JOIN applicant_details ad ON ad.job_application_id = ja.id 
WHERE ja.client_id = 1023
AND ja.status = 'rejected'
AND (
        EXISTS (
                SELECT *
                FROM applicant_profile ap
                JOIN applicant_ssn ss ON ss.profile_id = ap.id AND ss.number = '[encrypted value]'
                WHERE 1=1
                AND ap.id = ad.primary_profile_id
                )
        OR EXISTS (
                SELECT *
                FROM applicant_profile ap
                JOIN applicant_ssn ss ON ss.profile_id = ap.id AND ss.number = '[encrypted value]'
                WHERE 1=1
                AND ap.id = ad.secondary_profile_id
                )
        )
ORDER  BY ja.id DESC;
  • Related