Why is Postgres execution plan changing vastly based on where condition-CodePudding

I am trying to execute the same SQL but with different values for the where clause. One query is taking significantly longer time to process than the other. I have also observed that the execution plan for the two queries is different too,

Query1 and Execution Plan:

explain analyze
select t."postal_code"
from dev."postal_master" t
   left join dev."premise_master" f
      on t."primary_code" = f."primary_code"
         and t."name" = f."name"
         and t."final_code" = f."final_code"
where 1 = 1  and t."region" = 'US' 
  and t."name" = 'UBQ' 
  and t."accountModCode" = 'LTI' 
  and t."modularity_code" = 'PHA'
group by t."postal_code", t."modularity_code", t."region",
      t."feature", t."granularity"

Group  (cost=4.19..4.19 rows=1 width=38) (actual time=76411.456..76414.348 rows=11871 loops=1)
  Group Key: t."postal_code", t."modularity_code", t."region", t."feature", t.granularity
  ->  Sort  (cost=4.19..4.19 rows=1 width=38) (actual time=76411.452..76412.045 rows=11879 loops=1)
        Sort Key: t."postal_code", t."feature", t.granularity
        Sort Method: quicksort  Memory: 2055kB
        ->  Nested Loop Left Join  (cost=0.17..4.19 rows=1 width=38) (actual time=45.373..76362.219 rows=11879 loops=1)
              Join Filter: (((t."name")::text = (f."name")::text) AND ((t."primary_code")::text = (f."primary_code")::text) AND ((t."final_code")::text = (f."final_code")::text))
              Rows Removed by Join Filter: 150642887
              ->  Index Scan using idx_postal_code_source on postal_master t  (cost=0.09..2.09 rows=1 width=72) (actual time=36.652..154.339 rows=11871 loops=1)
                    Index Cond: (("name")::text = 'UBQ'::text)
                    Filter: ((("region")::text = 'US'::text) AND (("accountModCode")::text = 'LTI'::text) AND (("modularity_code")::text = 'PHA'::text))
                    Rows Removed by Filter: 550164
              ->  Index Scan using idx_postal_master_source on premise_master f  (cost=0.08..2.09 rows=1 width=35) (actual time=0.016..3.720 rows=12690 loops=11871)
                    Index Cond: (("name")::text = 'UBQ'::text)
Planning Time: 1.196 ms
Execution Time: 76415.004 ms

Query2 and Execution plan:

explain analyze
select t."postal_code"
from dev."postal_master" t
   left join dev."premise_master" f
      on t."primary_code" = f."primary_code"
         and t."name" = f."name"
         and t."final_code" = f."final_code"
where 1 = 1  and t."region" = 'DE' 
  and t."name" = 'EME' 
  and t."accountModCode" = 'QEW' 
  and t."modularity_code" = 'NFX'
group by t."postal_code", t."modularity_code", t."region",
         t."feature", t."granularity"

Group  (cost=50302.96..50426.04 rows=1330 width=38) (actual time=170.687..184.772 rows=8230 loops=1)
  Group Key: t."postal_code", t."modularity_code", t."region", t."feature", t.granularity
  ->  Gather Merge  (cost=50302.96..50423.27 rows=1108 width=38) (actual time=170.684..182.965 rows=8230 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Group  (cost=49302.95..49304.62 rows=554 width=38) (actual time=164.446..165.613 rows=2743 loops=3)
              Group Key: t."postal_code", t."modularity_code", t."region", t."feature", t.granularity
              ->  Sort  (cost=49302.95..49303.23 rows=554 width=38) (actual time=164.444..164.645 rows=3432 loops=3)
                    Sort Key: t."postal_code", t."feature", t.granularity
                    Sort Method: quicksort  Memory: 550kB
                    Worker 0:  Sort Method: quicksort  Memory: 318kB
                    Worker 1:  Sort Method: quicksort  Memory: 322kB
                    ->  Nested Loop Left Join  (cost=1036.17..49297.90 rows=554 width=38) (actual time=2.143..148.372 rows=3432 loops=3)
                          ->  Parallel Bitmap Heap Scan on territory_postal_mapping t  (cost=1018.37..38323.78 rows=554 width=72) (actual time=1.898..11.849 rows=2743 loops=3)
                                Recheck Cond: ((("accountModCode")::text = 'QEW'::text) AND (("region")::text = 'DE'::text) AND (("name")::text = 'EME'::text))
                                Filter: (("modularity_code")::text = 'NFX'::text)
                                Rows Removed by Filter: 5914
                                Heap Blocks: exact=2346
                                ->  Bitmap Index Scan on territorypostal__source_region_mod  (cost=0.00..1018.31 rows=48088 width=0) (actual time=4.783..4.783 rows=25973 loops=1)
                                      Index Cond: ((("accountModCode")::text = 'QEW'::text) AND (("region")::text = 'DE'::text) AND (("name")::text = 'EME'::text))
                          ->  Bitmap Heap Scan on premise_master f  (cost=17.80..19.81 rows=1 width=35) (actual time=0.047..0.048 rows=1 loops=8230)
                                Recheck Cond: (((t."primary_code")::text = ("primary_code")::text) AND ((t."final_code")::text = ("final_code")::text))
                                Filter: ((("name")::text = 'EME'::text) AND ((t."name")::text = ("name")::text))
                                Heap Blocks: exact=1955
                                ->  BitmapAnd  (cost=17.80..17.80 rows=1 width=0) (actual time=0.046..0.046 rows=0 loops=8230)
                                      ->  Bitmap Index Scan on premise_master__accountprimarypostal  (cost=0.00..1.95 rows=105 width=0) (actual time=0.008..0.008 rows=24 loops=8230)
                                            Index Cond: ((t."primary_code")::text = ("primary_code")::text)
                                      ->  Bitmap Index Scan on premise_master__accountfinalterritorycode  (cost=0.00..15.80 rows=1403 width=0) (actual time=0.065..0.065 rows=559 loops=4568)
                                            Index Cond: ((t."final_code")::text = ("final_code")::text)
Planning Time: 1.198 ms
Execution Time: 185.197 ms

I am aware that there will be different number of rows depending on the where condition but is that the only reason for the different execution plan. Also, how can I improve the performance of the first query.

CodePudding user response：

The estimates are totally wrong for the first query, so it is no surprise that PostgreSQL picks a bad plan. Thry these measures one after the other and see if they help:

Collect statistics:
```
ANALYZE premise_master, postal_master;
```

Calculate more precise statistics:

ALTER TABLE premise_master ALTER name SET statistics 1000;
ALTER TABLE postal_master ALTER name SET statistics 1000;
ANALYZE premise_master, postal_master;

CodePudding user response：

The first line of each EXPLAIN ANALYZE output suggests that the planner only expected 1 row from the first query, while it expected 1130 from the second, so that's probably why it chose a less efficient query plan. That usually means table statistics aren't up to date, and when they were last run there weren't many rows that would've matched the first query (maybe the data was being loaded in alphabetical order?). In this case the fix is to execute an ANALYZE dev."postal_master"query to refresh the statistics.

You could also try removing the GROUP BY clause entirely (if your tooling allows). I could be misreading but it doesn't look like it's affecting the output much. If that results in unwanted duplicates you can use select distinct t.postal_code instead of the group by.