From postgres docs on GIN -
Internally, a GIN index contains a B-tree index constructed over keys.
But in my use case, I see Bitmap
indexes instead -
My Schema and Indexes are created as below.
CREATE TABLE customer (
jdata jsonb NOT NULL,
comments text NULL
);
CREATE INDEX idx_jdata ON customer USING gin (jdata jsonb_path_ops);
Say, inserted a 10K records. sample prepared data/ alternate link
explain analyze select jdata from customer where jdata @> '{"supplier":{"id":"7f5644ca-f0d3-4f50-947b-9e3e38f7b796"}}'
Outpt -
Bitmap Heap Scan on file_set (cost=2139.27..36722.32 rows=10744 width=18) (actual time=1.438..267.122 rows=4048 loops=1)
Recheck Cond: (jdata @> '{"supplier": {"id": "7f5644ca-f0d3-4f50-947b-9e3e38f7b796"}}'::jsonb)
Heap Blocks: exact=1197
-> Bitmap Index Scan on idx_jdata (cost=0.00..2136.58 rows=10744 width=0) (actual time=1.214..1.214 rows=4048 loops=1)
Index Cond: (jdata @> '{"supplier": {"id": "7f5644ca-f0d3-4f50-947b-9e3e38f7b796"}}'::jsonb)
Planning Time: 0.065 ms
Execution Time: 267.456 ms
(the query plan above is based on real output form postgres - i had to change table/column names)
Why is Bitmap
index created when a GIN
index is created
My Postgres DB version is 13.5.
After looking at comment from @a-horse-with-no-name -> I tried below
SET enable_seqscan = OFF;
explain analyze
select * from pg_opclass where "oid"=10003;
SET enable_seqscan = ON;
and the output -
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------------------------
Index Scan using pg_opclass_oid_index on pg_opclass (cost=0.14..8.16 rows=1 width=93) (actual time=0.015..0.016 rows=1 loops=1)|
Index Cond: (oid = '10003'::oid) |
Planning Time: 0.060 ms |
Execution Time: 0.027 ms |
I see difference - "Bitmap Index Scan on idx_jdata" vs "Index Scan using pg_opclass_oid_index on pg_opclass"
Does this mean anything important. Can some one add more details about the using
vs on
and the on pg_opsclass
.
CodePudding user response:
A GIN index can contain the same tuple pointer multiple times, listed under different tokens which the field is broken down into. Those need to be de-duplicated to make sure it doesn't return duplicate rows where it should not, and the way GIN chooses to do this is by forcing it go through a bitmap index scan; which inherently deduplicates the pointers. It is not the index which is the bitmap, it is the scan which uses bitmaps. Any index can be used in a bitmap index scan, but GIN indexes can only be used in bitmap scans, due to the need to deduplicate.
CodePudding user response:
This is the correct typing for indexing a field of JSON column :
CREATE INDEX json_btree ON customer USING btree((jdata ->> 'jsonb_path_ops' ));
The two typings below does not index the JSON properly even if does a successful compilation:
CREATE INDEX json_gin ON customer USING gin( jdata jsonb_path_ops);
CREATE INDEX json_btree_padrao ON customer USING btree(jdata );