Home > Software design >  How can I get PostgreSQL to use an index with a computed value in a predicate?
How can I get PostgreSQL to use an index with a computed value in a predicate?

Time:01-22

What I mean is this. In PostgreSQL (v 15.1) I have a table foo created in the following way.

create table foo (
  id integer primary key generated by default as identity,
  id_mod_7 int generated always as (id % 7) stored
);

create index on foo (id_mod_7, id);

insert into foo (id) select generate_series(1, 10000);

If I query this table with a predicate that doesn't use a literal constant but rather uses a function, a sequential scan is used:

explain analyze 
select count(1) from foo where id_mod_7 = extract(dow from current_date);

                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Aggregate  (cost=245.12..245.13 rows=1 width=8) (actual time=7.218..7.219 rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..245.00 rows=50 width=0) (actual time=0.020..7.028 rows=1428 loops=1)
         Filter: ((id_mod_7)::numeric = EXTRACT(dow FROM CURRENT_DATE))
         Rows Removed by Filter: 8572
 Planning Time: 0.178 ms
 Execution Time: 7.281 ms

However, if I query this table with a predicate that does use a literal constant, an index scan is used:

explain analyze 
select count(1) from foo where id_mod_7 = 6;

                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48.84..48.85 rows=1 width=8) (actual time=0.321..0.322 rows=1 loops=1)
   ->  Index Only Scan using foo_id_mod_7_id_idx on foo  (cost=0.29..45.27 rows=1428 width=0) (actual time=0.022..0.214 rows=1428 loops=1)
         Index Cond: (id_mod_7 = 6)
         Heap Fetches: 0
 Planning Time: 0.106 ms
 Execution Time: 0.397 ms

I thought maybe I could fool it into using the index if I used the caching (alleged?) properties of Common Table Expressions (CTE), but to no avail:

explain analyze 
with param as (select extract(dow from current_date) as dow)
select count(1) from foo join param on id_mod_7 = dow;

                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Aggregate  (cost=245.12..245.13 rows=1 width=8) (actual time=5.830..5.831 rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..245.00 rows=50 width=0) (actual time=0.025..5.668 rows=1428 loops=1)
         Filter: ((id_mod_7)::numeric = EXTRACT(dow FROM CURRENT_DATE))
         Rows Removed by Filter: 8572
 Planning Time: 0.234 ms
 Execution Time: 5.894 ms

It's not fatal, but I'm just trying to understand what's going on here. Thanks!

P.S. and just to avoid confusion, it's not the table column that is being computed within the SQL query. It's the value in the predicate expression that is (or would be) computed within the SQL query.

Like I said, I've tried using a CTE because I believed the CTE would be cached or materialized and expected an index scan, but unfortunately still got a sequential scan.

CodePudding user response:

This is because extract() returns a numeric value, but the column is an integer. You can see this effect in the execution plan: (id_mod_7)::numeric = ... - the column needs to be cast to a numeric to be able to match the value from the extract() function

You need to cast the result of the extract() function to an int:

select count(*) 
from foo 
where id_mod_7 = extract(dow from current_date)::int
  • Related