Home > Mobile >  Postgres choosing a query plan that is more expensive by its own estimates
Postgres choosing a query plan that is more expensive by its own estimates

Time:02-18

I have the following 2 query plans for a particular query (second one was obtained by turning seqscan off):

seqscan is on

seqscan is off

The cost estimate for the second plan is lower than that for the first, however, pg only chooses the second plan if forced to do so (by turning seqscan off).

What could be causing this behaviour?


EDIT: Updating the question with information requested in a comment:

Output for EXPLAIN (ANALYZE, BUFFERS, VERBOSE) for query 1 (seqscan on; does not use index). Also viewable at https://explain.depesz.com/s/cGLY:

QUERY PLAN
Limit  (cost=2449.76..840962.24 rows=1 width=87) (actual time=25701.021..26540.060 rows=10 loops=1)
  Output: books.id, books.title, books.authors, books.meta
  Buffers: shared hit=2254959
  ->  Nested Loop Left Join  (cost=2449.76..840962.24 rows=1 width=87) (actual time=25289.899..26128.923 rows=10 loops=1)
        Output: books.id, books.title, books.authors, books.meta
        Join Filter: (photos."bookId" = books.id)
        Rows Removed by Join Filter: 62876457
        Filter: (photos.id IS NULL)
        Rows Removed by Filter: 707
        Buffers: shared hit=2254959
        ->  Gather  (cost=2449.76..835403.18 rows=1 width=87) (actual time=391.874..494.669 rows=658 loops=1)
              Output: books.id, books.title, books.authors, books.meta
              Workers Planned: 2
              Workers Launched: 2
              Buffers: shared hit=11837
              ->  Parallel Bitmap Heap Scan on public.books  (cost=1449.76..834403.08 rows=1 width=87) (actual time=868.495..874.706 rows=554 loops=3)
                    Output: books.id, books.title, books.authors, books.meta
                    Recheck Cond: ((books.meta !~~ 'foo%'::text) AND (books.meta <> 'bar'::text))
                    Filter: ((books.meta ~~ 'baz%'::text) AND (books.id <> ALL ('{19643405,19702275,19784617,28454289,28491188,28491190,28491205,28521585,28521596,28521627,28521638,28521649,28521658,28521678,28521680,28521689,28521700,28518165,28515245,28515256,28515288,28515299,28515310,28515342,28515353,28515364,28515407,28515736,28518100,28518219,28518273,28518370,28518424,28518478,28518489}'::integer[])))
                    Rows Removed by Filter: 77897
                    Heap Blocks: exact=11567
                    Buffers: shared hit=11837
                    Worker 0:  actual time=1107.154..1115.320 rows=1113 loops=1
                      JIT:
                        Functions: 6
                        Options: Inlining true, Optimization true, Expressions true, Deforming true
                        Timing: Generation 5.001 ms, Inlining 471.271 ms, Optimization 365.866 ms, Emission 269.821 ms, Total 1111.959 ms
                      Buffers: shared hit=40
                    Worker 1:  actual time=1108.335..1108.975 rows=541 loops=1
                      JIT:
                        Functions: 6
                        Options: Inlining true, Optimization true, Expressions true, Deforming true
                        Timing: Generation 11.915 ms, Inlining 450.341 ms, Optimization 364.168 ms, Emission 293.461 ms, Total 1119.885 ms
                      Buffers: shared hit=21
                    ->  Bitmap Index Scan on books_meta_partial_exclude_foo_and_bar  (cost=0.00..1449.76 rows=2194002 width=0) (actual time=41.801..41.802 rows=238689 loops=1)
                          Buffers: shared hit=209
        ->  Seq Scan on public.photos  (cost=0.00..4364.58 rows=95558 width=8) (actual time=0.002..17.127 rows=95558 loops=658)
              Output: photos.id, photos.url, photos.type, photos."userId", photos."libraryId", photos."bookId", photos."libraryBookId", photos."isPrimaryPic", photos."processingStatus", photos."createdAt", photos."updatedAt", photos."otherData"
              Buffers: shared hit=2243122
Planning:
  Buffers: shared hit=17
Planning Time: 0.758 ms
JIT:
  Functions: 24
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 20.953 ms, Inlining 1005.367 ms, Optimization 915.620 ms, Emission 705.338 ms, Total 2647.278 ms
Execution Time: 26544.310 ms

Output for EXPLAIN (ANALYZE, BUFFERS, VERBOSE) for query 2 (seqscan off; uses index). Also viewable at https://explain.depesz.com/s/VDfP:

QUERY PLAN
Limit  (cost=2450.18..835405.63 rows=1 width=87) (actual time=1110.719..2424.086 rows=10 loops=1)
  Output: books.id, books.title, books.authors, books.meta
  Buffers: shared hit=16834
  ->  Nested Loop Left Join  (cost=2450.18..835405.63 rows=1 width=87) (actual time=464.812..1778.175 rows=10 loops=1)
        Output: books.id, books.title, books.authors, books.meta
        Filter: (photos.id IS NULL)
        Rows Removed by Filter: 1321
        Buffers: shared hit=16834
        ->  Gather  (cost=2449.76..835403.18 rows=1 width=87) (actual time=411.878..1753.914 rows=1232 loops=1)
              Output: books.id, books.title, books.authors, books.meta
              Workers Planned: 2
              Workers Launched: 2
              Buffers: shared hit=11822
              ->  Parallel Bitmap Heap Scan on public.books  (cost=1449.76..834403.08 rows=1 width=87) (actual time=653.691..663.053 rows=411 loops=3)
                    Output: books.id, books.title, books.authors, books.meta
                    Recheck Cond: ((books.meta !~~ 'foo%'::text) AND (books.meta <> 'bar'::text))
                    Filter: ((books.meta ~~ 'baz%'::text) AND (books.id <> ALL ('{19643405,19702275,19784617,28454289,28491188,28491190,28491205,28521585,28521596,28521627,28521638,28521649,28521658,28521678,28521680,28521689,28521700,28518165,28515245,28515256,28515288,28515299,28515310,28515342,28515353,28515364,28515407,28515736,28518100,28518219,28518273,28518370,28518424,28518478,28518489}'::integer[])))
                    Rows Removed by Filter: 77893
                    Heap Blocks: exact=11611
                    Buffers: shared hit=11822
                    Worker 0:  actual time=774.890..774.891 rows=1 loops=1
                      JIT:
                        Functions: 6
                        Options: Inlining true, Optimization true, Expressions true, Deforming true
                        Timing: Generation 14.889 ms, Inlining 364.167 ms, Optimization 205.348 ms, Emission 205.226 ms, Total 789.630 ms
                      Buffers: shared hit=1
                    Worker 1:  actual time=780.309..780.311 rows=1 loops=1
                      JIT:
                        Functions: 6
                        Options: Inlining true, Optimization true, Expressions true, Deforming true
                        Timing: Generation 4.595 ms, Inlining 362.465 ms, Optimization 209.509 ms, Emission 208.145 ms, Total 784.715 ms
                      Buffers: shared hit=1
                    ->  Bitmap Index Scan on books_meta_partial_exclude_foo_and_bar  (cost=0.00..1449.76 rows=2194002 width=0) (actual time=56.500..56.501 rows=238689 loops=1)
                          Buffers: shared hit=209
        ->  Index Scan using "photos_bookId_idx" on public.photos  (cost=0.42..2.44 rows=1 width=8) (actual time=0.012..0.013 rows=1 loops=1232)
              Output: photos.id, photos.url, photos.type, photos."userId", photos."libraryId", photos."bookId", photos."libraryBookId", photos."isPrimaryPic", photos."processingStatus", photos."createdAt", photos."updatedAt", photos."otherData"
              Index Cond: (photos."bookId" = books.id)
              Buffers: shared hit=5012
Planning:
  Buffers: shared hit=17
Planning Time: 2.640 ms
JIT:
  Functions: 25
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 39.565 ms, Inlining 839.818 ms, Optimization 765.817 ms, Emission 599.027 ms, Total 2244.228 ms
Execution Time: 2455.226 ms

EDIT 2: adding information on table structure, indexes, and the query itself

-- Table: public.books

-- DROP TABLE IF EXISTS public.books;

CREATE TABLE IF NOT EXISTS public.books
(
    id integer NOT NULL DEFAULT nextval('books_id_seq'::regclass),
    title text COLLATE pg_catalog."default" NOT NULL,
    authors text COLLATE pg_catalog."default" NOT NULL,
    slug text COLLATE pg_catalog."default" NOT NULL,
    "desc" text COLLATE pg_catalog."default",
    meta text COLLATE pg_catalog."default",
    "createdAt" timestamp(3) without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
    "updatedAt" timestamp(3) without time zone NOT NULL,
    tsv tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(authors, ''::text)))) STORED,
    CONSTRAINT books_pkey PRIMARY KEY (id)
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.books
    OWNER to [REDACTED];
-- Index: books_fts_idx

-- DROP INDEX IF EXISTS public.books_fts_idx;

CREATE INDEX IF NOT EXISTS books_fts_idx
    ON public.books USING gin
    (tsv)
    TABLESPACE pg_default;
-- Index: books_meta_partial_exclude_foo_and_bar

-- DROP INDEX IF EXISTS public.books_meta_partial_exclude_foo_and_bar;

CREATE INDEX IF NOT EXISTS books_meta_partial_exclude_foo_and_bar
    ON public.books USING btree
    (meta COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default
    WHERE meta !~~ 'foo%'::text AND meta <> 'bar'::text;
-- Index: books_slug_key

-- DROP INDEX IF EXISTS public.books_slug_key;

CREATE UNIQUE INDEX IF NOT EXISTS books_slug_key
    ON public.books USING btree
    (slug COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default;
-- Table: public.photos

-- DROP TABLE IF EXISTS public.photos;

CREATE TABLE IF NOT EXISTS public.photos
(
    id integer NOT NULL DEFAULT nextval('photos_id_seq'::regclass),
    url text COLLATE pg_catalog."default" NOT NULL,
    type text COLLATE pg_catalog."default",
    "userId" integer,
    "libraryId" integer,
    "bookId" integer,
    "libraryBookId" integer,
    "isPrimaryPic" boolean DEFAULT false,
    "processingStatus" "PhotoProcessingStatus" NOT NULL DEFAULT 'UNPROCESSED'::"PhotoProcessingStatus",
    "createdAt" timestamp(3) without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
    "updatedAt" timestamp(3) without time zone NOT NULL,
    "otherData" jsonb,
    CONSTRAINT photos_pkey PRIMARY KEY (id),
    CONSTRAINT "photos_bookId_fkey" FOREIGN KEY ("bookId")
        REFERENCES public.books (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE SET NULL,
    CONSTRAINT "photos_libraryBookId_fkey" FOREIGN KEY ("libraryBookId")
        REFERENCES public.library_books (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE SET NULL,
    CONSTRAINT "photos_libraryId_fkey" FOREIGN KEY ("libraryId")
        REFERENCES public.libraries (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE SET NULL,
    CONSTRAINT "photos_userId_fkey" FOREIGN KEY ("userId")
        REFERENCES public.users (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE SET NULL
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.photos
    OWNER to [REDACTED];
-- Index: photos_bookId_idx

-- DROP INDEX IF EXISTS public."photos_bookId_idx";

CREATE INDEX IF NOT EXISTS "photos_bookId_idx"
    ON public.photos USING btree
    ("bookId" ASC NULLS LAST)
    TABLESPACE pg_default;
-- Index: photos_libraryId_idx

-- DROP INDEX IF EXISTS public."photos_libraryId_idx";

CREATE INDEX IF NOT EXISTS "photos_libraryId_idx"
    ON public.photos USING btree
    ("libraryId" ASC NULLS LAST)
    TABLESPACE pg_default;
-- Index: photos_userId_idx

-- DROP INDEX IF EXISTS public."photos_userId_idx";

CREATE INDEX IF NOT EXISTS "photos_userId_idx"
    ON public.photos USING btree
    ("userId" ASC NULLS LAST)
    TABLESPACE pg_default;

The query itself is:

SELECT
  books.id, books.title, books.authors, books.meta
FROM books
LEFT JOIN photos ON photos."bookId" = books.id
WHERE photos.id IS NULL
AND books.id NOT IN (19643405,19702275,19784617,28454289,28491188,28491190,28491205,28521585,28521596,28521627,28521638,28521649,28521658,28521678,28521680,28521689,28521700,28518165,28515245,28515256,28515288,28515299,28515310,28515342,28515353,28515364,28515407,28515736,28518100,28518219,28518273,28518370,28518424,28518478,28518489)
AND meta NOT LIKE 'foo%'
AND meta != 'bar'
AND meta LIKE 'baz%'
LIMIT 10; 

CodePudding user response:

Those two plans are effectively tied, they differ by less than 1% in expected cost. The planner avoids fully fleshing out plans which are apparently tied, to avoid extra work.

See compare_path_costs_fuzzily in the source code.

CodePudding user response:

You should have those two indexes to speed up your query :

CREATE INDEX X1 ON books (meta, id) INCLUDE  (title, authors);
CREATE INDEX X2 ON photos (id, bookId);

Also rewrite your query to eliminates redundant member of the filter predicate :

SELECT  books.id, books.title, books.authors, books.meta
FROM    books
        LEFT OUTER JOIN photos 
           ON photos.bookId = books.id
WHERE  photos.id IS NULL
       AND books.id NOT IN (19643405,19702275,19784617,28454289,28491188,28491190,28491205,28521585,28521596,28521627,28521638,28521649,28521658,28521678,28521680,28521689,28521700,28518165,28515245,28515256,28515288,28515299,28515310,28515342,28515353,28515364,28515407,28515736,28518100,28518219,28518273,28518370,28518424,28518478,28518489)
   --  AND books.meta NOT LIKE '[REDACTED-1]%' --> useless because books.meta LIKE '[REDACTED-3]%'
   --  AND books.meta != '[REDACTED-2]'        --> useless because books.meta LIKE '[REDACTED-3]%'
       AND books.meta LIKE '[REDACTED-3]%'
LIMIT 10;

Finally using a temporary table will probably do a better job:

CREATE LOCAL TEMPORARY TABLE temp_books_NOT_IN
(id INT PRIMARY KEY);

INSERT INTO temp_books_NOT_IN VALUES 
(19643405),
(19702275),
(19784617),
(28454289),
(28491188),
(28491190),
(28491205),
(28521585),
(28521596),
(28521627),
(28521638),
(28521649),
(28521658),
(28521678),
(28521680),
(28521689),
(28521700),
(28518165),
(28515245),
(28515256),
(28515288),
(28515299),
(28515310),
(28515342),
(28515353),
(28515364),
(28515407),
(28515736),
(28518100),
(28518219),
(28518273),
(28518370),
(28518424),
(28518478),
(28518489);

SELECT  books.id, books.title, books.authors, books.meta
FROM    books
        LEFT OUTER JOIN photos 
           ON photos.bookId = books.id
WHERE  photos.id IS NULL
       AND books.meta LIKE '[REDACTED-3]%'
       AND books.id NOT IN (SELECT id FROM temp_books_NOT_IN)
LIMIT 10;
  • Related