Home > Software design >  Optimize slow query when multiple JOIN LATERAL are present
Optimize slow query when multiple JOIN LATERAL are present

Time:11-15

This is a follow-up optimization on Adding a SUM group by into a COUNT DISTINCT query (albeit with some optimizations and joins simplifications).

I wonder if it's possible to optimize the following PostgreSQL 13.1 query which took 130322.2ms to complete. Normally if only one JOIN LATERAL is present it does it in a few ms.

What I'm most lost is given that each JOIN LATERAL has an ON with a condition based on the score of its own subquery, how could I optimize the query potentially reducing the number of JOIN LATERAL and still get the same results.

From what I see, it seems it gets slow when conditional OR are added to some of the WHERE inside the JOIN LATERAL instead of AND. See:

SELECT count(*)
FROM subscriptions q
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        AND ts.tag_id = 21
    ) AS q62958 ON q62958.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 32
    ) AS q120342 ON q120342.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 35
    ) AS q992506 ON q992506.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 33
    ) AS q343255 ON q343255.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 29
    ) AS q532052 ON q532052.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 30
    ) AS q268437 ON q268437.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        AND ts.tag_id = 46
    ) AS q553964 ON q553964.sum_score >= 3
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        AND ts.tag_id = 24
    ) AS q928243 ON q928243.sum_score >= 2
WHERE
  q.state = 'subscribed' AND q.app_id = 4
;

The subscriptions table has less than 15000 rows and less than 2000 matching the WHERE clause. Both q.state and q.app_id have indexes.

The complete EXPLAIN ANALYZE: https://explain.depesz.com/s/Ok0h

CodePudding user response:

The main issue is that the query is wrong:

 WHERE
            qa.quiz_id = q.quiz_id
            OR ts.tag_id = 32

The OR is misplaced here and must be introduced after aggregating the right rows. This WHERE clause includes all rows with either a matching quiz_id or tag_id = 32. So all rows, which is nonsense.

In addition to that, you can conflate multiple LATERAL subqueries with conditional aggregates like this:

SELECT count(*)
FROM   subscriptions q
JOIN   LATERAL (
   SELECT sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
        , sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
        , sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
     -- , more?
   FROM   quiz_answers qa
   JOIN   answers      a  ON a.id = qa.answer_id
   JOIN   tag_scores   ts ON ts.answer_id = a.id
   WHERE  qa.quiz_id = q.quiz_id
   AND    ts.tag_id IN (21, 32, 35) -- more?
   ) AS t ON t.sum_score21 <= 1
          OR t.sum_score32 <= 1
          OR t.sum_score35 <= 1
       -- AND / OR more?
WHERE  q.state = 'subscribed'
AND    q.app_id = 4;

Be aware of operator precedence when adding more conditions with AND or OR: You may need parentheses as AND binds before OR.

About aggregated FILTER:

A multicolumn index on subscriptions(app_id, state, quizz_id) might help (give you index-only scans). But since the table isn't that big, that's not important.

LATERAL (rather than a plain subquery) still makes sense while the outer filter eliminates most rows from table subscriptions. A multicolumn index on tag_scores(answer_id, tag_id) might help.

With more tags in the subquery and/or more subscriptions, the LATERAL variant and the usefulness of said index fall off.

For comparison, here is a variant with plain subquery:

SELECT count(*)
FROM   subscriptions q
JOIN  (
   SELECT qa.quiz_id
        , sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
        , sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
        , sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
     -- , more?
   FROM   quiz_answers qa
   JOIN   answers      a  ON a.id = qa.answer_id
   JOIN   tag_scores   ts ON ts.answer_id = a.id
   WHERE  ts.tag_id IN (21, 32, 35) -- more?
   GROUP  BY qa.quiz_id
   ) AS t USING (quiz_id)
WHERE  q.state = 'subscribed'
AND    q.app_id = 4
AND   (t.sum_score21 <= 1
    OR t.sum_score32 <= 1
    OR t.sum_score35 <= 1)
    -- AND / OR more?
;

Either way, t.sum_score21 <= 1 qualifies if ...

  • at least one row with tag 21 is associated
  • and the summed score of all with tag 21 for the same quiz is <= 1

Seems like a pretty narrow filter.

Denoise:

If the row in answers is never missing (referential integrity enforced with FK constraints?), you can cut out the middleman here:

   FROM   quiz_answers qa
   JOIN   answers      a  ON a.id = qa.answer_id
   JOIN   tag_scores   ts ON ts.answer_id = a.id

-->

   FROM   quiz_answers qa
   JOIN   tag_scores   ts USING (answer_id)
  • Related