Home > Net >  How to improve performance of ORDER BY in mysql?
How to improve performance of ORDER BY in mysql?

Time:10-17

Hi I am having one table with approximately one billion rows. Doing ORDER BY on indexed field is taking around 3 seconds to fetch 30 records with LIMIT, whereas without ORDER BY it takes 195ms. I want to speed up this.

can anyone help me out with this ?

here is the simplified version of query (have removed some fields and few joins).

SELECT DISTINCT `auctions_opportunity`.`id`,
                `auctions_opportunity`.`employer_id`,
                `auctions_opportunity`.`salary`,
                `auctions_opportunity`.`is_active`,
                `auctions_opportunity`.`is_interested`,
                `auctions_opportunity`.`interview_status`,
                `auctions_opportunity`.`previous_interview_status`,
                `auctions_opportunity`.`creation_source`,
                `auctions_opportunity`.`created_at`,
                `auctions_opportunity`.`last_modified`,
                `auctions_opportunity`.`last_instant_alert_email_at`,
                `auctions_opportunity`.`last_daily_alert_email_at`,
                `auctions_opportunity`.`last_periodic_alert_email_at`,
                `auctions_opportunity`.`reviewed_at`,
                `auctions_opportunity`.`last_modified_by_id`,
                `auctions_opportunity`.`candidate_id`,
                `auctions_opportunity`.`job_id`,
                `auctions_opportunity`.`interview_request_notes`,
                `auctions_opportunity`.`application_email_at`,
                `auctions_opportunity`.`batch_application_email_at`,
                `auctions_opportunity`.`is_location_match`,
                `auctions_opportunity`.`is_strong_match`,
                `auctions_opportunity`.`score`,
                `auctions_opportunity`.`es_score`,
                `auctions_opportunity`.`message`,
                `auctions_opportunity`.`es_maybe`,
                `candidates_candidate`.`id`,
                `candidates_candidate`.`user_id`,
                `candidates_candidate`.`phone`,
                `candidates_candidate`.`is_new`,
                `candidates_candidate`.`last_seen`,
                `candidates_candidate`.`last_agreed_to_terms_at`,
                `candidates_candidate`.`email_backend_status`,
                `candidates_candidate`.`email_suppressed_at`,
                `candidates_candidate`.`email_verified_at`,
                `candidates_candidate`.`number_verified_at`,
                `candidates_candidate`.`last_emailed_at`,
                `candidates_candidate`.`last_updated`,
                `candidates_candidate`.`internal_note`,
                `candidates_candidate`.`main_skills`,
                `candidates_candidate`.`main_skills_nopunc`,
                `candidates_candidate`.`total_experience`,
                `candidates_candidate`.`current_company`,
                `candidates_candidate`.`current_company_nopunc`,
                `candidates_candidate`.`current_designation`,
                `candidates_candidate`.`onboarding_completed_at`,
                `candidates_candidate`.`previously_onboarded`,
                `candidates_candidate`.`talent_advocate_id`,
                `candidates_candidate`.`job_function_skills`,
                `candidates_candidate`.`availability`,
                `candidates_candidate`.`deactivated_at`,
                `candidates_candidate`.`deactivated_by_id`,
                `candidates_candidate`.`deactivation_source`,
                `candidates_candidate`.`is_private`,
                `candidates_candidate`.`previous_companies`,
                `candidates_candidate`.`companies_interned_at`,
                `candidates_candidate`.`gender`,
                `candidates_candidate`.`skype_id`,
                `candidates_candidate`.`alternate_phone`,
                `candidates_candidate`.`shadow_linkedin`,
                `candidates_candidate`.`last_job_post_email`,
                `candidates_candidate`.`last_job_alert_email`,
                `candidates_candidate`.`last_joining_email_sent`,
                `candidates_candidate`.`last_hired_email_sent`,
                `candidates_candidate`.`last_reonboarding_email`,
                `candidates_candidate`.`last_indexed_at`,
                `candidates_candidate`.`resume_viewed_notification_type`,
                `candidates_candidate`.`last_resume_viewed_email_at`,
                `candidates_candidate`.`seen_go_premium_modal_at`,
                `candidates_candidate`.`seen_active_check_modal_at`,
                `candidates_candidate`.`alerts_limit_reached_at`,
                `candidates_candidate`.`calculation_done_at`,
                `candidates_candidate`.`calculation_attempted_at`,
                `candidates_candidate`.`bio`,
                `candidates_candidate`.`last_seen_activity_at`,
                `candidates_candidate`.`job_unsubscribed_at`,
                `candidates_candidate`.`monthly_alerts_unsubscribed_at`,
                `candidates_candidate`.`resume_views_unsubscribed_at`,
                `candidates_candidate`.`update_preferences_unsubscribed_at`,
                `candidates_candidate`.`onboarding_reminder_unsubscribed_at`,
                `candidates_candidate`.`is_hireable`,
                `candidates_candidate`.`recruiter_message_push_unsubscribed_at`,
                `candidates_candidate`.`resume_views_email_unsubscribed_at`,
                `candidates_candidate`.`resume_views_push_unsubscribed_at`,
                `candidates_candidate`.`profile_reminder_email_unsubscribed_at`,
                `candidates_candidate`.`profile_reminder_push_unsubscribed_at`,
                `candidates_candidate`.`newsletter_email_unsubscribed_at`,
                `candidates_candidate`.`newsletter_push_unsubscribed_at`,
                `candidates_candidate`.`product_updates_email_unsubscribed_at`,
                `candidates_candidate`.`product_updates_push_unsubscribed_at`,
                `candidates_candidate`.`push_notifications_shown_at`,
                `candidates_candidate`.`push_notifications_enabled`,
                `candidates_candidate`.`push_notifications_verified_at`,
                `candidates_candidate`.`whatsapp_number`,
                `candidates_candidate`.`whatsapp_enabled`,
                `candidates_candidate`.`whatsapp_verified_at`,
                `candidates_candidate`.`last_whatsapp_sent_at`,
                `auth_user`.`id`,
                `auth_user`.`password`,
                `auth_user`.`last_login`,
                `auth_user`.`is_superuser`,
                `auth_user`.`username`,
                `auth_user`.`first_name`,
                `auth_user`.`last_name`,
                `auth_user`.`email`,
                `auth_user`.`is_staff`,
                `auth_user`.`is_active`,
                `auth_user`.`date_joined`
FROM `auctions_opportunity`
INNER JOIN `employers_employer` ON (`auctions_opportunity`.`employer_id` = `employers_employer`.`id`)
INNER JOIN `jobs_job` ON (`auctions_opportunity`.`job_id` = `jobs_job`.`id`)
INNER JOIN `profiles_profilestatus` ON (`employers_employer`.`status_id` = `profiles_profilestatus`.`id`)
INNER JOIN `candidates_candidate` ON (`auctions_opportunity`.`candidate_id` = `candidates_candidate`.`id`)
INNER JOIN `auth_user` ON (`candidates_candidate`.`user_id` = `auth_user`.`id`)
INNER JOIN `candidates_resume` ON (`candidates_candidate`.`id` = `candidates_resume`.`candidate_id`)
LEFT OUTER JOIN `candidates_jobsearchpreferences` ON (`candidates_candidate`.`id` = `candidates_jobsearchpreferences`.`candidate_id`)
LEFT OUTER JOIN `candidates_customaction` ON (`auctions_opportunity`.`id` = `candidates_customaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_emailaction` ON (`auctions_opportunity`.`id` = `candidates_emailaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_saveaction` ON (`auctions_opportunity`.`id` = `candidates_saveaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_hideaction` ON (`auctions_opportunity`.`id` = `candidates_hideaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_hireaction` ON (`auctions_opportunity`.`id` = `candidates_hireaction`.`opportunity_id`)
WHERE (`auctions_opportunity`.`employer_id` = 4
       AND NOT (`auctions_opportunity`.`job_id` IS NULL)
       AND `profiles_profilestatus`.`name` = Approved
       AND NOT (`auth_user`.`email` = deactivated@blobinfotech.com)
       AND NOT (`candidates_candidate`.`availability` = 3)
       AND NOT (`auctions_opportunity`.`job_id` IS NULL)
       AND NOT (`candidates_resume`.`id` IS NULL)
       AND (`jobs_job`.`is_active` = TRUE
            AND `auctions_opportunity`.`is_active` = TRUE)
       AND ((((`candidates_candidate`.`is_hireable` = TRUE
               AND `candidates_candidate`.`last_seen` >= 2020-12-18 09:19:42.873898
               AND `candidates_jobsearchpreferences`.`status` = 0)
              OR (`candidates_candidate`.`is_hireable` = TRUE
                  AND (NOT (`candidates_jobsearchpreferences`.`status` = 0)
                       OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.873989
                           AND `candidates_jobsearchpreferences`.`status` = 0))))
             AND `candidates_candidate`.`is_private` = FALSE)
            OR `auctions_opportunity`.`interview_status` = 1
            OR NOT (`auctions_opportunity`.`id` IN
                      (SELECT U0.`id` AS Col1
                       FROM `auctions_opportunity` U0
                       LEFT OUTER JOIN `candidates_emailaction` U1 ON (U0.`id` = U1.`opportunity_id`)
                       WHERE (U1.`reply_email_at` IS NULL
                              AND U0.`id` = (`auctions_opportunity`.`id`)))))
       AND NOT (`auctions_opportunity`.`candidate_id` IN
                  (SELECT U2.`candidate_id` AS Col1
                   FROM `candidates_blockedemployerintermediate` U2
                   WHERE U2.`employer_id` = 4))
       AND `auctions_opportunity`.`job_id` IN (40729)
       AND NOT (((`auctions_opportunity`.`creation_source` = 8
                  AND `auctions_opportunity`.`creation_source` IS NOT NULL)
                 OR (`auctions_opportunity`.`interview_status` = 1
                     AND `auctions_opportunity`.`is_active` = FALSE)))
       AND ((((`candidates_candidate`.`is_hireable` = TRUE
               AND (NOT (`candidates_jobsearchpreferences`.`status` = 0
                         AND `candidates_jobsearchpreferences`.`status` IS NOT NULL)
                    OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.863410
                        AND `candidates_jobsearchpreferences`.`status` = 0)))
              OR (`candidates_candidate`.`is_hireable` = TRUE
                  AND `candidates_jobsearchpreferences`.`status` = 2))
             AND `candidates_candidate`.`is_private` = FALSE
             AND `auctions_opportunity`.`interview_status` = 0
             AND (`auctions_opportunity`.`is_location_match` = TRUE
                  OR ((`candidates_jobsearchpreferences`.`current_location` LIKE %Delhi%
                       OR `candidates_jobsearchpreferences`.`current_location` LIKE %Noida%
                       OR `candidates_jobsearchpreferences`.`current_location` LIKE %Gurgaon%
                       OR `candidates_jobsearchpreferences`.`current_location` LIKE %Faridabad%
                       OR `candidates_jobsearchpreferences`.`current_location` LIKE %Greater Noida%)
                      AND (`jobs_job`.`locations` LIKE %Delhi%
                           OR `jobs_job`.`locations` LIKE %Noida%
                           OR `jobs_job`.`locations` LIKE %Gurgaon%
                           OR `jobs_job`.`locations` LIKE %Faridabad%
                           OR `jobs_job`.`locations` LIKE %Greater Noida%)
                      AND `jobs_job`.`accept_outstation` = FALSE)))
            OR (((`candidates_candidate`.`is_hireable` = TRUE
                  AND (NOT (`candidates_jobsearchpreferences`.`status` = 0
                            AND `candidates_jobsearchpreferences`.`status` IS NOT NULL)
                       OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.863410
                           AND `candidates_jobsearchpreferences`.`status` = 0)))
                 OR (`candidates_candidate`.`is_hireable` = TRUE
                     AND `candidates_jobsearchpreferences`.`status` = 2))
                AND `auctions_opportunity`.`reviewed_at` < 2020-12-18 09:19:42.863329
                AND `auctions_opportunity`.`interview_status` = 1))
       AND `auctions_opportunity`.`employer_id` = 4
       AND `candidates_customaction`.`id` IS NULL
       AND `candidates_emailaction`.`id` IS NULL
       AND `candidates_saveaction`.`id` IS NULL
       AND (`candidates_hideaction`.`id` IS NULL
            OR `candidates_hideaction`.`is_deleted` = TRUE)
       AND `candidates_hireaction`.`id` IS NULL
       AND `auctions_opportunity`.`employer_id` = 4)
ORDER BY `auctions_opportunity`.`reviewed_at` DESC
LIMIT 30

CodePudding user response:

While it may seem like the ORDER BY is to blame, it may well be other things causing poor performance.

  • OR in WHERE and ON should be avoided where practical.

  • NOT x IN ( SELECT ... ) should be turned into either NOT EXISTS ( SELECT 1 ... ) or LEFT JOIN ( SELECT y ... ) ... WHERE y IS NULL

  • Some possible indexes to help:

      auctions_opportunity:  INDEX(employer_id, interview_status, job_id)
      candidates_saveaction:  INDEX(opportunity_id, id)
      profiles_profilestatus:  INDEX(name)
      candidates_jobsearchpreferences:  INDEX(status)
      U1:  INDEX(reply_email_at)
    

CodePudding user response:

The general answer is rather easy. If you take a telephone book and are asked to name thirty telephone numbers (LIMIT 30 without an ORDER BY) where the person's street name contains an A, how long would that take you? A few minutes probably. Now, you are asked to name the thirty lowest telephone numbers (LIMIT 30 with an ORDER BY) where the person's street name contains an A. You'll have to read the whole book, pick out all numbers where the street name matches, then sort all entries by number and then take the first thirty. How long will that take in comparison?

So, it's not at all surprising that you see a significant difference with and without using ORDER BY. It's expected.

In your case the DBMS may just pick rows with is_location_match = TRUE and thus avoid the costly location searches with LIKE for instance (if this suffices to find 30 rows in the end). It can't do so, when it must consider all matches in order to find your top 30 rows specified by ORDER BY.

Back to the telephone book: If the telephone book has only 50 entries, the job won't be too hard. If it has a million entries, searching it will take ages. So it must be our objective to deal with a small data set. You are joining the table candidates_resume without using it. (The condition AND NOT candidates_resume.id IS NULL is superfluous and makes no sense.) You are also joining all candidates_hideaction rows where is_deleted = TRUE. Let's say there are three resumes and five deleted hide actions per candidate. Thus you are blowing up your data set by the factor of 15 (3 x 5) only to dismiss the duplicates later with DISTINCT. DISTINCT is a very costly operation on large data sets. It is also a typical indicator for a badly written query, as in your case: It is unnecessary joins that blow up your intermediate result, and rather than fixing the join problem you are applying DISTINCT in the end.

You are also using many anti joins (e.g. LEFT OUTER JOIN candidates_customaction ... WHERE candidates_customaction.id IS NULL). This had me confused at first. I wondered why you are cross joining all those candidate actions until I found the IS NULL conditions. I consider these anti joins not readable. And now let's say that there ten actions for each of the four anti-joined action types per candidate. If the DBMS does the join, it will create 10000 (10 x 10 x 10 x 10) rows for a single candidate that must all get dismissed. I'd opt for NOT EXISTS clauses instead to get this more readable and maybe avoid an unnecessarily large intermediate result. I'd use NOT EXISTS for the check for undeleted hide actions, too.

Having said this, my recommendation is that you change the query such that you can dismiss DISTINCT. This can save the DBMS a lot of work and may show in the final execution speed.

There are some minor things you could do to get the query more readable. You don't have to state it thrice that you want employer_id = 4 :-) Then, status = 0 AND status IS NOT NULL is just status = 0 of course. And NOT (status = 0 AND status IS NOT NULL) OR (last_seen < xxx AND status = 0) is hence just status <> 0 OR last_seen < xxx. This won't speed up the query, but it will make it more readable and maintainable, and maybe you even wanted to consider status NULL and made a mistake that gets more obvious in the process of cleaning the conditions up.

  • Related