Home > database >  Nested MariaDB Query Slow
Nested MariaDB Query Slow

Time:09-29

I am having performance issues with a query, I have 21 million records across the table, and 2 of the tables I'm looking in here have 8 million each; individually, they are very quick. But I've done a query that, in my opinion, isn't very good, but it's the only way I know how to do it.

This query takes 65 seconds, I need to get it under 1 second and I think it's possible if I don't have all the SELECT queries, but once again, I am not sure how else to do it with my SQL knowledge.

Database server version is MariaDB 10.6.

SELECT
pa.`slug`,
(
    SELECT 
    SUM(`impressions`) 
    FROM `rh_pages_gsc_country` 
    WHERE `page_id` = pa.`page_id` 
    AND `country` = 'aus'
    AND `date_id` IN 
        (
            SELECT `date_id` 
            FROM `rh_pages_gsc_dates` 
            WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
        )
) as au_impressions,
(
    SELECT 
    SUM(`clicks`) 
    FROM `rh_pages_gsc_country` 
    WHERE `page_id` = pa.`page_id` 
    AND `country` = 'aus'
    AND `date_id` IN 
        (
            SELECT `date_id` 
            FROM `rh_pages_gsc_dates` 
            WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
        )
) as au_clicks,
(
    SELECT 
    COUNT(`keywords_id`) 
    FROM `rh_pages_gsc_keywords` 
    WHERE `page_id` = pa.`page_id`
    AND `date_id` IN 
        (
            SELECT `date_id` 
            FROM `rh_pages_gsc_dates` 
            WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
        )
) as keywords,
(
    SELECT 
    AVG(`position`) 
    FROM `rh_pages_gsc_keywords` 
    WHERE `page_id` = pa.`page_id`
    AND `date_id` IN 
        (
            SELECT `date_id` 
            FROM `rh_pages_gsc_dates` 
            WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
        )
) as avg_pos,
(
    SELECT 
    AVG(`ctr`) 
    FROM `rh_pages_gsc_keywords` 
    WHERE `page_id` = pa.`page_id`
    AND `date_id` IN 
        (
            SELECT `date_id` 
            FROM `rh_pages_gsc_dates` 
            WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
        )
) as avg_ctr
FROM `rh_pages` pa
WHERE pa.`site_id` = 13
ORDER BY au_impressions DESC, keywords DESC, slug DESC

If anyone can help, I don't think the table structure is needed here as it's basically shown in the query, but here is a photo of the constraints and table types.

SQL Tables

Anyone that can help is greatly appreciated.

CodePudding user response:

Your query is aggregating (summarizing) rows from two different detail tables, rh_pages_gsc_country and rh_pages_gsc_keywords, and doing so for a particular date range. And it has a lot of correlated subqueries.

The first steps in your path to better performance are

  • Converting your correlated subqueries to independent subqueries, then JOINing them.
  • Writing one subquery for each detail table, rather than one for each column you need summarized.

You mentioned you've been struggling with this. The concept I hope you learn from this answer is this: you can often refactor away your correlated subqueries if you can come up with independent subqueries that give the same results, and then join them together. If you mention subqueries in your SELECT clause -- SELECT ... (SELECT whatever) whatever ... -- you probably have an opportunity to do this refactoring.

Here goes. First you need a subquery for your date range. You have this one right, just repeated.

           SELECT `date_id` 
            FROM `rh_pages_gsc_dates` 
            WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()

Next you need a subquery for rh_pages_gsc_country. It is a modification of what you have. We'll fetch both SUMs in one subquery.

      SELECT SUM(`impressions`) impressions,
             SUM(`clicks`) clicks, 
             page_id, date_id
        FROM `rh_pages_gsc_country` 
       WHERE `country` = 'aus'
      GROUP BY page_id, date_id

See how this goes? This subquery yields a virtual table with exactly one row for every combination of page_id and date_id, containing the number of impressions and the number of clicks.

Next, let's join the subqueries together in a main query. This yields some columns of your result set.

SELECT pa.slug, country.impressions, country.clicks
  FROM rh_pages pa 
  JOIN (
           SELECT SUM(`impressions`) impressions,
                  SUM(`clicks`) clicks, 
                  page_id, date_id
             FROM `rh_pages_gsc_country` 
            WHERE `country` = 'aus'  -- constant for country code
            GROUP BY page_id, date_id
      ) country ON  country.page_id = pa.page_id
  JOIN (
           SELECT `date_id` 
            FROM `rh_pages_gsc_dates` 
            WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
       ) dates ON dates.date_id = country.date_id
 WHERE pa.site_id = 13             -- constant for page id
 ORDER BY country.impressions DESC

This runs through the rows of rh_pages_gsc_dates and rh_pages_gsc_country just once to satisfy your query. So, faster.

Finally let's do the same thing for your rh_pages_gsc_keywords table's summary.

SELECT pa.slug, country.impressions, country.clicks,
       keywords.keywords, keywords.avg_pos, keywords.avg_ctr
  FROM rh_pages pa 
  JOIN (
           SELECT SUM(`impressions`) impressions,
                  SUM(`clicks`) clicks, 
                  page_id, date_id
             FROM `rh_pages_gsc_country` 
            WHERE `country` = 'aus'  -- constant for country code
            GROUP BY page_id, date_id
      ) country ON  country.page_id = pa.page_id
  JOIN (
           SELECT SUM(`keywords_id`) keywords,
                  AVG(`position`) position,
                  AVG(`ctr`) avg_ctr, 
                  page_id, date_id
             FROM `rh_pages_gsc_keywords` 
            GROUP BY page_id, date_id
       ) keywords ON keywords.page_id = pa.page_id
  JOIN (
           SELECT `date_id` 
            FROM `rh_pages_gsc_keywords` 
            WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
       ) dates ON dates.date_id = country.date_id
              AND dates.date_id = keywords.date_id
 WHERE pa.site_id = 13             -- constant for page id
 ORDER BY impressions DESC, keywords DESC, slug DESC

This will almost certainly be faster than what you have now. If it's fast enough, great. If not, please don't hesitate to ask another question for help, tagging it . We will need to see your table definitions, your index definitions, and the output of EXPLAIN. Please read this before asking a followup question.

I did not, repeat not, debug any of this. That's up to you.

CodePudding user response:

Do NOT normalize any column that will be regularly used in a "range scan", such as date. The following is terribly slow:

  AND  `date_id` IN (
                SELECT  `date_id`
                    FROM  `rh_pages_gsc_dates`
                    WHERE  `date` BETWEEN NOW() - INTERVAL 12 MONTH
                                      AND NOW() ) 

It also consumes extra space to have BIGINT (8 bytes) pointing to a DATE (5 bytes).

Once you move the date to the various tables, the subqueries simplify, such as

        SELECT  AVG(`position`)
            FROM  `rh_pages_gsc_keywords`
            WHERE  `page_id` = pa.`page_id`
              AND  `date_id` IN (
                SELECT  `date_id`
                    FROM  `rh_pages_gsc_dates`
                    WHERE  `date` BETWEEN NOW() - INTERVAL 12 MONTH
                                      AND NOW() ) 

becomes

        SELECT  AVG(`position`)
            FROM  `rh_pages_gsc_keywords`
            WHERE  `page_id` = pa.`page_id`
              AND  `date` >= NOW() - INTERVAL 12 MONTH

I'm assuming that nothing after "NOW" has yet been stored. If there are dates in the future, then add

              AND  `date`  < NOW()

Each table will probably need a new index, such as

INDEX(page_id, date)  -- in that order

(Yes, the "JOIN" suggestion by others is a good one. It's essentially orthogonal to my suggestions above and below.)

After you have made those changes, if the performance is not good enough, we can discuss Summary Tables

  • Related