Hi I am having one table with approximately one billion rows. Doing ORDER BY on indexed field is taking around 3 seconds to fetch 30 records with LIMIT, whereas without ORDER BY it takes 195ms. I want to speed up this.
can anyone help me out with this ?
here is the simplified version of query (have removed some fields and few joins).
SELECT DISTINCT `auctions_opportunity`.`id`,
`auctions_opportunity`.`employer_id`,
`auctions_opportunity`.`salary`,
`auctions_opportunity`.`is_active`,
`auctions_opportunity`.`is_interested`,
`auctions_opportunity`.`interview_status`,
`auctions_opportunity`.`previous_interview_status`,
`auctions_opportunity`.`creation_source`,
`auctions_opportunity`.`created_at`,
`auctions_opportunity`.`last_modified`,
`auctions_opportunity`.`last_instant_alert_email_at`,
`auctions_opportunity`.`last_daily_alert_email_at`,
`auctions_opportunity`.`last_periodic_alert_email_at`,
`auctions_opportunity`.`reviewed_at`,
`auctions_opportunity`.`last_modified_by_id`,
`auctions_opportunity`.`candidate_id`,
`auctions_opportunity`.`job_id`,
`auctions_opportunity`.`interview_request_notes`,
`auctions_opportunity`.`application_email_at`,
`auctions_opportunity`.`batch_application_email_at`,
`auctions_opportunity`.`is_location_match`,
`auctions_opportunity`.`is_strong_match`,
`auctions_opportunity`.`score`,
`auctions_opportunity`.`es_score`,
`auctions_opportunity`.`message`,
`auctions_opportunity`.`es_maybe`,
`candidates_candidate`.`id`,
`candidates_candidate`.`user_id`,
`candidates_candidate`.`phone`,
`candidates_candidate`.`is_new`,
`candidates_candidate`.`last_seen`,
`candidates_candidate`.`last_agreed_to_terms_at`,
`candidates_candidate`.`email_backend_status`,
`candidates_candidate`.`email_suppressed_at`,
`candidates_candidate`.`email_verified_at`,
`candidates_candidate`.`number_verified_at`,
`candidates_candidate`.`last_emailed_at`,
`candidates_candidate`.`last_updated`,
`candidates_candidate`.`internal_note`,
`candidates_candidate`.`main_skills`,
`candidates_candidate`.`main_skills_nopunc`,
`candidates_candidate`.`total_experience`,
`candidates_candidate`.`current_company`,
`candidates_candidate`.`current_company_nopunc`,
`candidates_candidate`.`current_designation`,
`candidates_candidate`.`onboarding_completed_at`,
`candidates_candidate`.`previously_onboarded`,
`candidates_candidate`.`talent_advocate_id`,
`candidates_candidate`.`job_function_skills`,
`candidates_candidate`.`availability`,
`candidates_candidate`.`deactivated_at`,
`candidates_candidate`.`deactivated_by_id`,
`candidates_candidate`.`deactivation_source`,
`candidates_candidate`.`is_private`,
`candidates_candidate`.`previous_companies`,
`candidates_candidate`.`companies_interned_at`,
`candidates_candidate`.`gender`,
`candidates_candidate`.`skype_id`,
`candidates_candidate`.`alternate_phone`,
`candidates_candidate`.`shadow_linkedin`,
`candidates_candidate`.`last_job_post_email`,
`candidates_candidate`.`last_job_alert_email`,
`candidates_candidate`.`last_joining_email_sent`,
`candidates_candidate`.`last_hired_email_sent`,
`candidates_candidate`.`last_reonboarding_email`,
`candidates_candidate`.`last_indexed_at`,
`candidates_candidate`.`resume_viewed_notification_type`,
`candidates_candidate`.`last_resume_viewed_email_at`,
`candidates_candidate`.`seen_go_premium_modal_at`,
`candidates_candidate`.`seen_active_check_modal_at`,
`candidates_candidate`.`alerts_limit_reached_at`,
`candidates_candidate`.`calculation_done_at`,
`candidates_candidate`.`calculation_attempted_at`,
`candidates_candidate`.`bio`,
`candidates_candidate`.`last_seen_activity_at`,
`candidates_candidate`.`job_unsubscribed_at`,
`candidates_candidate`.`monthly_alerts_unsubscribed_at`,
`candidates_candidate`.`resume_views_unsubscribed_at`,
`candidates_candidate`.`update_preferences_unsubscribed_at`,
`candidates_candidate`.`onboarding_reminder_unsubscribed_at`,
`candidates_candidate`.`is_hireable`,
`candidates_candidate`.`recruiter_message_push_unsubscribed_at`,
`candidates_candidate`.`resume_views_email_unsubscribed_at`,
`candidates_candidate`.`resume_views_push_unsubscribed_at`,
`candidates_candidate`.`profile_reminder_email_unsubscribed_at`,
`candidates_candidate`.`profile_reminder_push_unsubscribed_at`,
`candidates_candidate`.`newsletter_email_unsubscribed_at`,
`candidates_candidate`.`newsletter_push_unsubscribed_at`,
`candidates_candidate`.`product_updates_email_unsubscribed_at`,
`candidates_candidate`.`product_updates_push_unsubscribed_at`,
`candidates_candidate`.`push_notifications_shown_at`,
`candidates_candidate`.`push_notifications_enabled`,
`candidates_candidate`.`push_notifications_verified_at`,
`candidates_candidate`.`whatsapp_number`,
`candidates_candidate`.`whatsapp_enabled`,
`candidates_candidate`.`whatsapp_verified_at`,
`candidates_candidate`.`last_whatsapp_sent_at`,
`auth_user`.`id`,
`auth_user`.`password`,
`auth_user`.`last_login`,
`auth_user`.`is_superuser`,
`auth_user`.`username`,
`auth_user`.`first_name`,
`auth_user`.`last_name`,
`auth_user`.`email`,
`auth_user`.`is_staff`,
`auth_user`.`is_active`,
`auth_user`.`date_joined`
FROM `auctions_opportunity`
INNER JOIN `employers_employer` ON (`auctions_opportunity`.`employer_id` = `employers_employer`.`id`)
INNER JOIN `jobs_job` ON (`auctions_opportunity`.`job_id` = `jobs_job`.`id`)
INNER JOIN `profiles_profilestatus` ON (`employers_employer`.`status_id` = `profiles_profilestatus`.`id`)
INNER JOIN `candidates_candidate` ON (`auctions_opportunity`.`candidate_id` = `candidates_candidate`.`id`)
INNER JOIN `auth_user` ON (`candidates_candidate`.`user_id` = `auth_user`.`id`)
INNER JOIN `candidates_resume` ON (`candidates_candidate`.`id` = `candidates_resume`.`candidate_id`)
LEFT OUTER JOIN `candidates_jobsearchpreferences` ON (`candidates_candidate`.`id` = `candidates_jobsearchpreferences`.`candidate_id`)
LEFT OUTER JOIN `candidates_customaction` ON (`auctions_opportunity`.`id` = `candidates_customaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_emailaction` ON (`auctions_opportunity`.`id` = `candidates_emailaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_saveaction` ON (`auctions_opportunity`.`id` = `candidates_saveaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_hideaction` ON (`auctions_opportunity`.`id` = `candidates_hideaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_hireaction` ON (`auctions_opportunity`.`id` = `candidates_hireaction`.`opportunity_id`)
WHERE (`auctions_opportunity`.`employer_id` = 4
AND NOT (`auctions_opportunity`.`job_id` IS NULL)
AND `profiles_profilestatus`.`name` = Approved
AND NOT (`auth_user`.`email` = deactivated@blobinfotech.com)
AND NOT (`candidates_candidate`.`availability` = 3)
AND NOT (`auctions_opportunity`.`job_id` IS NULL)
AND NOT (`candidates_resume`.`id` IS NULL)
AND (`jobs_job`.`is_active` = TRUE
AND `auctions_opportunity`.`is_active` = TRUE)
AND ((((`candidates_candidate`.`is_hireable` = TRUE
AND `candidates_candidate`.`last_seen` >= 2020-12-18 09:19:42.873898
AND `candidates_jobsearchpreferences`.`status` = 0)
OR (`candidates_candidate`.`is_hireable` = TRUE
AND (NOT (`candidates_jobsearchpreferences`.`status` = 0)
OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.873989
AND `candidates_jobsearchpreferences`.`status` = 0))))
AND `candidates_candidate`.`is_private` = FALSE)
OR `auctions_opportunity`.`interview_status` = 1
OR NOT (`auctions_opportunity`.`id` IN
(SELECT U0.`id` AS Col1
FROM `auctions_opportunity` U0
LEFT OUTER JOIN `candidates_emailaction` U1 ON (U0.`id` = U1.`opportunity_id`)
WHERE (U1.`reply_email_at` IS NULL
AND U0.`id` = (`auctions_opportunity`.`id`)))))
AND NOT (`auctions_opportunity`.`candidate_id` IN
(SELECT U2.`candidate_id` AS Col1
FROM `candidates_blockedemployerintermediate` U2
WHERE U2.`employer_id` = 4))
AND `auctions_opportunity`.`job_id` IN (40729)
AND NOT (((`auctions_opportunity`.`creation_source` = 8
AND `auctions_opportunity`.`creation_source` IS NOT NULL)
OR (`auctions_opportunity`.`interview_status` = 1
AND `auctions_opportunity`.`is_active` = FALSE)))
AND ((((`candidates_candidate`.`is_hireable` = TRUE
AND (NOT (`candidates_jobsearchpreferences`.`status` = 0
AND `candidates_jobsearchpreferences`.`status` IS NOT NULL)
OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.863410
AND `candidates_jobsearchpreferences`.`status` = 0)))
OR (`candidates_candidate`.`is_hireable` = TRUE
AND `candidates_jobsearchpreferences`.`status` = 2))
AND `candidates_candidate`.`is_private` = FALSE
AND `auctions_opportunity`.`interview_status` = 0
AND (`auctions_opportunity`.`is_location_match` = TRUE
OR ((`candidates_jobsearchpreferences`.`current_location` LIKE %Delhi%
OR `candidates_jobsearchpreferences`.`current_location` LIKE %Noida%
OR `candidates_jobsearchpreferences`.`current_location` LIKE %Gurgaon%
OR `candidates_jobsearchpreferences`.`current_location` LIKE %Faridabad%
OR `candidates_jobsearchpreferences`.`current_location` LIKE %Greater Noida%)
AND (`jobs_job`.`locations` LIKE %Delhi%
OR `jobs_job`.`locations` LIKE %Noida%
OR `jobs_job`.`locations` LIKE %Gurgaon%
OR `jobs_job`.`locations` LIKE %Faridabad%
OR `jobs_job`.`locations` LIKE %Greater Noida%)
AND `jobs_job`.`accept_outstation` = FALSE)))
OR (((`candidates_candidate`.`is_hireable` = TRUE
AND (NOT (`candidates_jobsearchpreferences`.`status` = 0
AND `candidates_jobsearchpreferences`.`status` IS NOT NULL)
OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.863410
AND `candidates_jobsearchpreferences`.`status` = 0)))
OR (`candidates_candidate`.`is_hireable` = TRUE
AND `candidates_jobsearchpreferences`.`status` = 2))
AND `auctions_opportunity`.`reviewed_at` < 2020-12-18 09:19:42.863329
AND `auctions_opportunity`.`interview_status` = 1))
AND `auctions_opportunity`.`employer_id` = 4
AND `candidates_customaction`.`id` IS NULL
AND `candidates_emailaction`.`id` IS NULL
AND `candidates_saveaction`.`id` IS NULL
AND (`candidates_hideaction`.`id` IS NULL
OR `candidates_hideaction`.`is_deleted` = TRUE)
AND `candidates_hireaction`.`id` IS NULL
AND `auctions_opportunity`.`employer_id` = 4)
ORDER BY `auctions_opportunity`.`reviewed_at` DESC
LIMIT 30
CodePudding user response:
While it may seem like the ORDER BY
is to blame, it may well be other things causing poor performance.
OR
inWHERE
andON
should be avoided where practical.NOT x IN ( SELECT ... )
should be turned into eitherNOT EXISTS ( SELECT 1 ... )
orLEFT JOIN ( SELECT y ... ) ... WHERE y IS NULL
Some possible indexes to help:
auctions_opportunity: INDEX(employer_id, interview_status, job_id) candidates_saveaction: INDEX(opportunity_id, id) profiles_profilestatus: INDEX(name) candidates_jobsearchpreferences: INDEX(status) U1: INDEX(reply_email_at)
CodePudding user response:
The general answer is rather easy. If you take a telephone book and are asked to name thirty telephone numbers (LIMIT 30
without an ORDER BY
) where the person's street name contains an A, how long would that take you? A few minutes probably. Now, you are asked to name the thirty lowest telephone numbers (LIMIT 30
with an ORDER BY
) where the person's street name contains an A. You'll have to read the whole book, pick out all numbers where the street name matches, then sort all entries by number and then take the first thirty. How long will that take in comparison?
So, it's not at all surprising that you see a significant difference with and without using ORDER BY
. It's expected.
In your case the DBMS may just pick rows with is_location_match = TRUE
and thus avoid the costly location searches with LIKE
for instance (if this suffices to find 30 rows in the end). It can't do so, when it must consider all matches in order to find your top 30 rows specified by ORDER BY
.
Back to the telephone book: If the telephone book has only 50 entries, the job won't be too hard. If it has a million entries, searching it will take ages. So it must be our objective to deal with a small data set. You are joining the table candidates_resume
without using it. (The condition AND NOT candidates_resume.id IS NULL
is superfluous and makes no sense.) You are also joining all candidates_hideaction
rows where is_deleted = TRUE
. Let's say there are three resumes and five deleted hide actions per candidate. Thus you are blowing up your data set by the factor of 15 (3 x 5) only to dismiss the duplicates later with DISTINCT
. DISTINCT
is a very costly operation on large data sets. It is also a typical indicator for a badly written query, as in your case: It is unnecessary joins that blow up your intermediate result, and rather than fixing the join problem you are applying DISTINCT
in the end.
You are also using many anti joins (e.g. LEFT OUTER JOIN candidates_customaction
... WHERE candidates_customaction.id IS NULL
). This had me confused at first. I wondered why you are cross joining all those candidate actions until I found the IS NULL
conditions. I consider these anti joins not readable. And now let's say that there ten actions for each of the four anti-joined action types per candidate. If the DBMS does the join, it will create 10000 (10 x 10 x 10 x 10) rows for a single candidate that must all get dismissed. I'd opt for NOT EXISTS
clauses instead to get this more readable and maybe avoid an unnecessarily large intermediate result. I'd use NOT EXISTS
for the check for undeleted hide actions, too.
Having said this, my recommendation is that you change the query such that you can dismiss DISTINCT
. This can save the DBMS a lot of work and may show in the final execution speed.
There are some minor things you could do to get the query more readable. You don't have to state it thrice that you want employer_id = 4
:-) Then, status = 0 AND status IS NOT NULL
is just status = 0
of course. And NOT (status = 0 AND status IS NOT NULL) OR (last_seen < xxx AND status = 0)
is hence just status <> 0 OR last_seen < xxx
. This won't speed up the query, but it will make it more readable and maintainable, and maybe you even wanted to consider status NULL and made a mistake that gets more obvious in the process of cleaning the conditions up.