Postgres Query uses Bitmap index scan on GIN indexed item. Does not GIN use B-Tree internally?-CodePudding

From postgres docs on GIN -

Internally, a GIN index contains a B-tree index constructed over keys.

But in my use case, I see Bitmap indexes instead -

My Schema and Indexes are created as below.

CREATE TABLE customer (
    jdata jsonb NOT NULL,
    comments text NULL
);

CREATE INDEX idx_jdata ON customer USING gin (jdata jsonb_path_ops);

Say, inserted a 10K records. sample prepared data/ alternate link

explain analyze select jdata from  customer where jdata @> '{"supplier":{"id":"7f5644ca-f0d3-4f50-947b-9e3e38f7b796"}}'

Outpt -

Bitmap Heap Scan on file_set  (cost=2139.27..36722.32 rows=10744 width=18) (actual time=1.438..267.122 rows=4048 loops=1)
  Recheck Cond: (jdata @> '{"supplier": {"id": "7f5644ca-f0d3-4f50-947b-9e3e38f7b796"}}'::jsonb)
  Heap Blocks: exact=1197
  ->  Bitmap Index Scan on idx_jdata  (cost=0.00..2136.58 rows=10744 width=0) (actual time=1.214..1.214 rows=4048 loops=1)
        Index Cond: (jdata @> '{"supplier": {"id": "7f5644ca-f0d3-4f50-947b-9e3e38f7b796"}}'::jsonb)
Planning Time: 0.065 ms
Execution Time: 267.456 ms

(the query plan above is based on real output form postgres - i had to change table/column names)

Why is Bitmap index created when a GIN index is created

My Postgres DB version is 13.5.

After looking at comment from @a-horse-with-no-name -> I tried below

SET enable_seqscan = OFF;
explain analyze
select * from pg_opclass where "oid"=10003;
SET enable_seqscan = ON;

and the output -

QUERY PLAN                                                                                                                      |
-------------------------------------------------------------------------------------------------------------------------------- 
Index Scan using pg_opclass_oid_index on pg_opclass  (cost=0.14..8.16 rows=1 width=93) (actual time=0.015..0.016 rows=1 loops=1)|
  Index Cond: (oid = '10003'::oid)                                                                                              |
Planning Time: 0.060 ms                                                                                                         |
Execution Time: 0.027 ms                                                                                                        |

I see difference - "Bitmap Index Scan on idx_jdata" vs "Index Scan using pg_opclass_oid_index on pg_opclass"

Does this mean anything important. Can some one add more details about the using vs on and the on pg_opsclass.

CodePudding user response：

A GIN index can contain the same tuple pointer multiple times, listed under different tokens which the field is broken down into. Those need to be de-duplicated to make sure it doesn't return duplicate rows where it should not, and the way GIN chooses to do this is by forcing it go through a bitmap index scan; which inherently deduplicates the pointers. It is not the index which is the bitmap, it is the scan which uses bitmaps. Any index can be used in a bitmap index scan, but GIN indexes can only be used in bitmap scans, due to the need to deduplicate.

CodePudding user response：

This is the correct typing for indexing a field of JSON column :

CREATE INDEX json_btree ON customer USING btree((jdata  ->> 'jsonb_path_ops' ));

The two typings below does not index the JSON properly even if does a successful compilation:

CREATE INDEX json_gin ON customer USING gin( jdata  jsonb_path_ops); 
CREATE INDEX json_btree_padrao ON customer USING btree(jdata  );