Home > Mobile >  Why does the query planner start picking a worse plan after a certain number of joins inside a CTE?
Why does the query planner start picking a worse plan after a certain number of joins inside a CTE?

Time:10-02

I have the following query:

EXPLAIN ANALYZE
WITH up AS (
  SELECT
    ua1.id ua1_id,
    gp1.id gp1_id,
    upai1.id upai1_id,
    upbi1.id upbi1_id,
    upiv1.id upiv1_id,
    vi.id vi_id,
    c.id c_id,
    sua.id sua_id
  FROM ua1
  LEFT JOIN gp1 ON
    ua1.gp1_id = gp1.id
  LEFT JOIN upai1 ON
    upai1.id = ua1.upai1_id
  LEFT JOIN upbi1 ON
    upbi1.id = ua1.upbi1_id
  LEFT JOIN upiv1 ON
    upiv1.id = ua1.upiv1_id
  LEFT JOIN vi ON
    vi.id = ua1.vi_id
  LEFT JOIN c ON
    c.id = vi.c_id
  LEFT JOIN sua ON
    sua.ua_id = ua1.id  
)
SELECT
  up1.*,
  hrm1.ua_id,
  hrm1.hr_id
FROM hrm1 hrm1
INNER JOIN up up1 ON
  up1.ua1_id = hrm1.ua_id
WHERE
hrm1.hr_id = 1

which results in the following query plan:

Gather  (cost=42072.86..122816.40 rows=6 width=40) (actual time=7.207..10.132 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
   ->  Hash Join  (cost=41072.86..121815.80 rows=2 width=40) (actual time=0.068..0.070 rows=0 loops=3)
Hash Cond: (ua1.id = hrm1.ua_id)
         ->  Parallel Hash Left Join  (cost=41045.41..121231.60 rows=212096 width=32) (never executed)
Hash Cond: (ua1.id = sua.ua_id)
               ->  Hash Left Join  (cost=36200.82..115830.26 rows=212096 width=28) (never executed)
                    Hash Cond: (vi.c_id = c.id)
                     ->  Parallel Hash Left Join  (cost=36192.93..115254.76 rows=212096 width=28) (never executed)
Hash Cond: (ua1.vi_id = vi.id)
                           ->  Hash Left Join  (cost=11544.90..85850.97 rows=212096 width=24) (never executed)
Hash Cond: (ua1.upiv1_id = upiv1.id)
                                 ->  Hash Left Join  (cost=11506.32..85255.64 rows=212096 width=24) (never executed)
Hash Cond: (ua1.upbi1_id = upbi1.id)
                                       ->  Parallel Hash Left Join  (cost=11216.81..84409.38 rows=212096 width=24) (never executed)
Hash Cond: (ua1.upai1_id = upai1.id)
                                             ->  Hash Left Join  (cost=677.14..70177.96 rows=212096 width=24) (never executed)
Hash Cond: (ua1.location_gp1_id = gp1.id)
                                                   ->  Parallel Seq Scan on ua ua1  (cost=0.00..68943.96 rows=212096 width=24) (never executed)
                                                   ->  Hash  (cost=478.69..478.69 rows=15876 width=4) (never executed)
                                                         ->  Index Only Scan using gp1_pkey on gp1 gp1  (cost=0.29..478.69 rows=15876 width=4) (never executed)
Heap Fetches: 0
                                             ->  Parallel Hash  (cost=7815.85..7815.85 rows=165985 width=4) (never executed)
                                                   ->  Parallel Seq Scan on upai1 upai1  (cost=0.00..7815.85 rows=165985 width=4) (never executed)
                                       ->  Hash  (cost=170.34..170.34 rows=9534 width=4) (never executed)
                                             ->  Seq Scan on upbi1 upbi1  (cost=0.00..170.34 rows=9534 width=4) (never executed)
                                 ->  Hash  (cost=22.70..22.70 rows=1270 width=4) (never executed)
                                       ->  Seq Scan on upiv1 upiv1  (cost=0.00..22.70 rows=1270 width=4) (never executed)
                           ->  Parallel Hash  (cost=17455.57..17455.57 rows=438357 width=8) (never executed)
                                 ->  Parallel Seq Scan on vi vi  (cost=0.00..17455.57 rows=438357 width=8) (never executed)
                     ->  Hash  (cost=5.17..5.17 rows=217 width=4) (never executed)
                           ->  Seq Scan on c c  (cost=0.00..5.17 rows=217 width=4) (never executed)
               ->  Parallel Hash  (cost=4679.82..4679.82 rows=13182 width=8) (never executed)
                     ->  Parallel Seq Scan on sua sua  (cost=0.00..4679.82 rows=13182 width=8) (never executed)
         ->  Hash  (cost=27.37..27.37 rows=6 width=8) (actual time=0.021..0.021 rows=0 loops=3)
Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Index Scan using hrm1_hr_id_idx on hrm1 hrm1  (cost=0.42..27.37 rows=6 width=8) (actual time=0.020..0.020 rows=0 loops=3)
Index Cond: (hr_id = 1)
Planning Time: 5.125 ms
Execution Time: 10.223 ms

Removing one or more of the joins inside the CTE (it does not appear to matter which), however, causes the planner to fall back to doing nested loop joins, which is much faster. Note: In this simplified example, the difference is not that big, but using nested loop joins over hash joins in the final query was a difference between it taking less than a millisecond and it taking over a second.

Nested Loop Left Join  (cost=2.83..91.77 rows=6 width=36) (actual time=0.016..0.017 rows=0 loops=1)
   ->  Nested Loop Left Join  (cost=2.42..88.95 rows=6 width=32) (actual time=0.015..0.016 rows=0 loops=1)
         ->  Nested Loop Left Join  (cost=1.99..85.80 rows=6 width=32) (actual time=0.015..0.016 rows=0 loops=1)
               ->  Nested Loop Left Join  (cost=1.83..84.76 rows=6 width=32) (actual time=0.015..0.016 rows=0 loops=1)
                     ->  Nested Loop Left Join  (cost=1.55..82.95 rows=6 width=32) (actual time=0.015..0.016 rows=0 loops=1)
                           ->  Nested Loop Left Join  (cost=1.13..79.83 rows=6 width=32) (actual time=0.015..0.016 rows=0 loops=1)
                                 ->  Nested Loop  (cost=0.84..78.01 rows=6 width=32) (actual time=0.015..0.016 rows=0 loops=1)
                                       ->  Index Scan using hrm1_hr_id_idx on hrm1 hrm1  (cost=0.42..27.37 rows=6 width=8) (actual time=0.015..0.015 rows=0 loops=1)
Index Cond: (hr_id = 6766566)
                                       ->  Index Scan using ua_pkey on ua ua1  (cost=0.42..8.44 rows=1 width=24) (never executed)
Index Cond: (id = hrm1.ua_id)
                                 ->  Index Only Scan using gp1_pkey on gp1 gp1  (cost=0.29..0.30 rows=1 width=4) (never executed)
Index Cond: (id = ua1.location_gp1_id)
Heap Fetches: 0
                           ->  Index Only Scan using upai1_pkey on upai1 upai1  (cost=0.42..0.52 rows=1 width=4) (never executed)
Index Cond: (id = ua1.upai1_id)
Heap Fetches: 0
                     ->  Index Only Scan using upbi1_pkey on upbi1 upbi1  (cost=0.29..0.30 rows=1 width=4) (never executed)
Index Cond: (id = ua1.upbi1_id)
Heap Fetches: 0
               ->  Index Only Scan using upiv1_pkey on upiv1 upiv1  (cost=0.15..0.17 rows=1 width=4) (never executed)
Index Cond: (id = ua1.upiv1_id)
Heap Fetches: 0
         ->  Index Only Scan using vi_id_pkey on vi vi  (cost=0.43..0.52 rows=1 width=4) (never executed)
Index Cond: (id = ua1.vi_id)
Heap Fetches: 0
   ->  Index Scan using sua_ua_id_uniq_idx on sua sua  (cost=0.41..0.47 rows=1 width=8) (never executed)
Index Cond: (ua_id = ua1.id)
Planning Time: 7.002 ms
Execution Time: 0.089 ms

Similarly, eliminating the CTE and using a simplified SELECT also resulted in the more optimal plan with nested loop joins.

I'd like to understand why the addition of one more join causes the query planner to pick a less efficient plan , and what steps we could take (like enforcing a convention like just not using CTEs) that would keep us from running into this same issue again as our queries evolve and grow.

CodePudding user response:

Looks like you exhausted the join_collapse_limit (which is 8 by default), and Postgres - defending against mushrooming planning cost - stops trying to reorder joins to optimize the query plan.

"But the CTE only joins 8 tables!" I hear you say.

Well, assuming at least Postgres 12, plain CTEs can be inlined in the main query. That's exactly what I see happening here, and that results in 9 tables. Boom.

See:

Possible solutions

You can try to increase that limit, but be aware of factorially (!) growing combinations and hence planning cost. Note the already dominating cost for query planning here:

Planning Time: 7.002 ms
Execution Time: 0.089 ms

For repeated execution, a prepared statement or a PL/pgSQL function saving query plans in a similar fashion might help (a lot) here ...

Or you force the CTE to be materialized with the MATERIALIZED keyword, thereby also forcing a separate query plan with:

WITH up AS MATERIALIZED ( ...

See:

Downside: added cost for materializing the intermediate result without need.

Or you reorder joins by hand to optimize the query plan. (Which may be subject to bitrot with changing cardinalities in the underlying DB ...)

  • Related