Home > Back-end >  Postgres partial vs regular index
Postgres partial vs regular index

Time:02-21

I have a table with 1m records, with 100k records having null on colA. Remaining records have pretty distinct values, is there a difference in creating a regular index on this column vs a partial index with where colA is not null?

Since regular Postgres indexes do not store NULL values, wouldn't it be the same as creating a partial index with where colA is not null?
Any pros or cons with either indexes?

CodePudding user response:

If you create a partial index without nulls, it will not use it to find nulls.

Here's a test with a full index on 13.5.

# create index idx_test_num on test(num);
CREATE INDEX

# explain select count(*) from test where num is null;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Aggregate  (cost=5135.00..5135.01 rows=1 width=8)
   ->  Bitmap Heap Scan on test  (cost=63.05..5121.25 rows=5500 width=0)
         Recheck Cond: (num IS NULL)
         ->  Bitmap Index Scan on idx_test_num  (cost=0.00..61.68 rows=5500 width=0)
               Index Cond: (num IS NULL)
(5 rows)

And with a partial index.

# create index idx_test_num on test(num) where num is not null;
CREATE INDEX

# explain select count(*) from test where num is null;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10458.12..10458.13 rows=1 width=8)
   ->  Gather  (cost=10457.90..10458.11 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=9457.90..9457.91 rows=1 width=8)
               ->  Parallel Seq Scan on test  (cost=0.00..9352.33 rows=42228 width=0)
                     Filter: (num IS NULL)
(6 rows)

Since regular postgres indexes do not store NULL values...

This has not been true since version 8.2 [checks notes] 16 years ago. The 8.2 docs say...

Indexes are not used for IS NULL clauses by default. The best way to use indexes in such cases is to create a partial index using an IS NULL predicate.

8.3 introduced nulls first and nulls last and many other improvements around nulls including...

Allow col IS NULL to use an index (Teodor)

CodePudding user response:

It all depends.

NULL values are included in (default) B-tree indexes since version Postgres 8.3, like Schwern provided. However, predicates like the one you mention (where colA is not null) are only properly supported since Postgres 9.0. The release notes:

Allow IS NOT NULL restrictions to use indexes (Tom Lane)

This is particularly useful for finding MAX()/MIN() values in indexes that contain many null values.

GIN indexes followed later:

As of PostgreSQL 9.1, null key values can be included in the index.

Typically, a partial index makes sense if it excludes a major part of the table from the index, making it substantially smaller and saving writes to the index. Since B-tree indexes are so shallow, bare seek performance scales fantastically (once the index is cached). 10 % fewer index entries hardly matter in that area.

Your case would exclude only around 10% of all rows, and that rarely pays. A partial index adds some overhead for the query planner and excludes queries that don't match the index condition. (The Postgres query planner doesn't try hard if the match is not immediately obvious.)

OTOH, Postgres will rarely use an index for predicates retrieving 10 % of the table - a sequential scan will typically be faster. Again, it depends.

If (almost) all queries exclude NULL anyway (in a way the Postgres planner understands), then a partial index excluding only 10 % of all rows is still a sensible option. But it may backfire if query patterns change. The added complexity may not be worth it.

Also worth noting that there are still corner cases with NULL values in Postgres indexes. I bumped into this case recently where Postgres proved unwilling to read sorted rows from a multicolumn index when the first index expression was filtered with IS NULL (making a partial index preferable for the case):

db<>fiddle here

So, it depends on the complete picture.

  • Related