PostgreSQL full text search - choosing inefficient execution plan-CodePudding

Assume the following query - tables, columns and keys should be pretty obvious (otherwise please ask).

SELECT DISTINCT p.IDProduct
FROM Catalog.Catalog c
INNER JOIN Catalog.Product p ON (
    p.FKIDCatalog=c.IDCatalog
)
INNER JOIN Catalog.ProductLanguage pl ON (
    pl.FKIDProduct=p.IDProduct
    AND (
        pl.FKIDLanguage='de_DE'
        OR pl.FKIDLanguage=c.FKIDLanguage
    )
)
WHERE to_tsvector(SearchConfig, COALESCE(pl.DescriptionShort, '') || ' ' || COALESCE(pl.DescriptionLong, '') || ' ' || COALESCE(pl.KeywordList, '')) @@ to_tsquery('''vorschlaghammer'':*') 
AND c.IDCatalog IN (5, 24, 6, 7, 11, 12, 8, 1, 23)

The IN-clause is determined by user permission and creates a search space of ~1.3M products (out of 2M) with 181 hits - quite a typical use case. Unfortunately it takes 49 seconds to return the result. EXPLAIN (analyze, buffers, format text) shows the following query plan:

 Unique  (cost=59887.83..59887.89 rows=13 width=4) (actual time=48934.329..48972.548 rows=181 loops=1)
   Buffers: shared hit=5386635
   ->  Sort  (cost=59887.83..59887.86 rows=13 width=4) (actual time=48934.328..48972.520 rows=181 loops=1)
         Sort Key: p.idproduct
         Sort Method: quicksort  Memory: 33kB
         Buffers: shared hit=5386635
         ->  Gather  (cost=1045.52..59887.59 rows=13 width=4) (actual time=908.689..48972.460 rows=181 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               Buffers: shared hit=5386635
               ->  Nested Loop  (cost=45.52..58886.29 rows=5 width=4) (actual time=3215.182..48926.270 rows=60 loops=3)
                     Join Filter: (((pl.fkidlanguage)::text = 'de_DE'::text) OR ((pl.fkidlanguage)::text = (c.fkidlanguage)::text))
                     Buffers: shared hit=5386635
                     ->  Hash Join  (cost=45.09..57038.74 rows=1319 width=10) (actual time=0.167..249.085 rows=438115 loops=3)
                           Hash Cond: (p.fkidcatalog = c.idcatalog)
                           Buffers: shared hit=44799
                           ->  Parallel Seq Scan on product p  (cost=0.00..54420.03 rows=979803 width=8) (actual time=0.015..66.259 rows=783365 loops=3)
                                 Buffers: shared hit=44622
                           ->  Hash  (cost=44.98..44.98 rows=9 width=10) (actual time=0.075..0.076 rows=9 loops=3)
                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                 Buffers: shared hit=77
                                 ->  Index Scan using catalog_pkey on catalog c  (cost=0.28..44.98 rows=9 width=10) (actual time=0.033..0.068 rows=9 loops=3)
                                       Index Cond: (idcatalog = ANY ('{5,24,6,7,11,12,8,1,23}'::integer[]))
                                       Buffers: shared hit=77
                     ->  Index Scan using productlanguage_pkey on productlanguage pl  (cost=0.43..1.39 rows=1 width=10) (actual time=0.111..0.111 rows=0 loops=1314345)
                           Index Cond: (fkidproduct = p.idproduct)
                           Filter: (to_tsvector(searchconfig, (((((COALESCE(descriptionshort, ''::character varying))::text || ' '::text) || COALESCE(descriptionlong, ''::text)) || ' '::text) || COALESCE(keywordlist, ''::text))) @@ to_tsquery('''vorschlaghammer'':*'::text))
                           Rows Removed by Filter: 1
                           Buffers: shared hit=5341836
 Planning:
   Buffers: shared hit=65
 Planning Time: 1.905 ms
 Execution Time: 48972.635 ms
(33 rows)

I am not really familiar with execution plans, but I'd say that it is unwise to get 1.3M products first and afterwards iterate through all of them to check the fulltext-condition; of course query time is reduced if I narrow down the set of catalogs and the other way round. But if replace the IN-clause with e.g. AND c.IDCatalog<29 (which selects all major catalogs), the query optimizer does what I'd have expected it to do in the first place (likely because it has to consider "almost all" products anyway):

 Unique  (cost=63069.02..63073.42 rows=37 width=4) (actual time=36.778..39.404 rows=265 loops=1)
   Buffers: shared hit=1395
   ->  Gather Merge  (cost=63069.02..63073.33 rows=37 width=4) (actual time=36.777..39.360 rows=265 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=1395
         ->  Sort  (cost=62068.99..62069.03 rows=15 width=4) (actual time=1.269..1.277 rows=88 loops=3)
               Sort Key: p.idproduct
               Sort Method: quicksort  Memory: 37kB
               Buffers: shared hit=1395
               Worker 0:  Sort Method: quicksort  Memory: 25kB
               Worker 1:  Sort Method: quicksort  Memory: 25kB
               ->  Hash Join  (cost=320.56..62068.70 rows=15 width=4) (actual time=0.926..1.229 rows=88 loops=3)
                     Hash Cond: (p.fkidcatalog = c.idcatalog)
                     Join Filter: (((pl.fkidlanguage)::text = 'de_DE'::text) OR ((pl.fkidlanguage)::text = (c.fkidlanguage)::text))
                     Buffers: shared hit=1381
                     ->  Nested Loop  (cost=294.26..62031.43 rows=4171 width=14) (actual time=0.761..1.039 rows=88 loops=3)
                           Buffers: shared hit=1240
                           ->  Parallel Bitmap Heap Scan on productlanguage pl  (cost=293.83..35768.94 rows=4171 width=10) (actual time=0.756..0.819 rows=88 loops=3)
                                 Recheck Cond: (to_tsvector(searchconfig, (((((COALESCE(descriptionshort, ''::character varying))::text || ' '::text) || COALESCE(descriptionlong, ''::text)) || ' '::text) || COALESCE(keywordlist, ''::text))) @@ to_tsquery('''vorschlaghammer'':*'::text))
                                 Heap Blocks: exact=133
                                 Buffers: shared hit=180
                                 ->  Bitmap Index Scan on productlanguage_descriptionshort_descriptionlong_keywordlist  (cost=0.00..291.33 rows=10010 width=0) (actual time=2.208..2.209 rows=265 loops=1)
                                       Index Cond: (to_tsvector(searchconfig, (((((COALESCE(descriptionshort, ''::character varying))::text || ' '::text) || COALESCE(descriptionlong, ''::text)) || ' '::text) || COALESCE(keywordlist, ''::text))) @@ to_tsquery('''vorschlaghammer'':*'::text))
                                       Buffers: shared hit=47
                           ->  Index Scan using product_pkey on product p  (cost=0.43..6.30 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=265)
                                 Index Cond: (idproduct = pl.fkidproduct)
                                 Buffers: shared hit=1060
                     ->  Hash  (cost=25.99..25.99 rows=25 width=10) (actual time=0.097..0.098 rows=21 loops=3)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           Buffers: shared hit=41
                           ->  Index Scan using catalog_pkey on catalog c  (cost=0.28..25.99 rows=25 width=10) (actual time=0.036..0.085 rows=21 loops=3)
                                 Index Cond: (idcatalog < 29)
                                 Buffers: shared hit=41
 Planning:
   Buffers: shared hit=68
 Planning Time: 1.903 ms
 Execution Time: 39.517 ms
(38 rows)

This is 3 magnitudes faster, and I'd expect PostgreSQL to be able to filter the 265 resulting rows in another few milliseconds to add the original IN-clause.

Of course PostgreSQL can only guess which way to go, but it is very unsatisfying if it makes a decision which is so wrong. In fact, a response time of 49s is completely unacceptable for my users whereas 40ms would be barely noticeable. I never experienced something similar with non-fulltext queries.

So there might be two questions: a) how to fix/workaround this specific use case b) how to work with fulltext-queries in general in terms of performance?

CodePudding user response：

A major root of the problem seems to be that your hash join of "product" to "catalog" is misestimated by over 300 fold. That has nothing to do with the FTS. So I would say it is possibly just luck that you ran into this problem with a FTS query rather than some other query.

PostgreSQL would agree that it is not a good idea to get 1.3M products first, but it thinks it will need to get about 4000 (1319*3) products.

So why is that? It comes down to p.FKIDCatalog=c.IDCatalog and c.IDCatalog IN (5, 24, 6, 7, 11, 12, 8, 1, 23). It estimates this by taking how many rows of p each value of FKIDCatalog matches on average, times 9. But the 9 specific values you list are not average they are instead the extremely common ones. If you instead wrote that as p.FKIDCatalog=c.IDCatalog and p.FKIDCatalog IN (5, 24, 6, 7, 11, 12, 8, 1, 23), then it would estimate the rows it would expect to find for each of those 9 specific values and sum them.

Usually PostgreSQL correctly estimates transitive property of equality, that is, if you wrote it as p.FKIDCatalog=c.IDCatalog and c.IDCatalog=5, it knows that it can get a specific estimate for p.FKIDCatalog=5 and use that. But it does not do the same thing for the transitive property of IN-list (except if the IN-list is only one item long, then it rewrites to be simple equality and does apply the transitive law), even though conceptually it could.

I would also note that the estimate for the full-text index, visible in your other plan, is also quite bad, expecting 4171 rows but finding only 88. I don't know why this is so bad, in my hands tv @@ tq is usually better estimated than that. Has the table been ANALYZEd recently? At least since the expressional index was added?

Fixing either of these alone might be enough to shift the plan to the faster one.