Home > database >  experimental with hash lookup for IN clauses with many constants in PostgreSQL
experimental with hash lookup for IN clauses with many constants in PostgreSQL

Time:02-27

Wondering around https://www.postgresql.org/docs/release/14.0/

  • Allow hash lookup for IN clauses with many constants (James Coleman, David Rowley)

example where empid in (1,2,3), Can I understand it this way: PostgreSQL will hash 1, 2, 3 into 3 different hash value, then for each value, compare with table's hashup, if match then grap the matched row.

  1. How do I know the query used hash lookup ?
  2. Previously How does Postgresql do lookup for IN clause with many constants?
  3. not that familiar with C,but want to see the code. Then I search git commit message https://prnt.sc/S15_5xc1cbBn, I cannot found relevant info, How can I search relevant info.

The following is QUERY PLAN for explain analyze select * from emp where empid in (1,2,3);

                                             QUERY PLAN -----------------------------------------------------------------------------------------------
     Seq Scan on emp  (cost=0.00..1.12 rows=3 width=190) (actual time=0.016..0.017 rows=3 loops=1)
       Filter: (empid = ANY ('{1,2,3}'::integer[]))
       Rows Removed by Filter: 4
     Planning Time: 1.054 ms
     Execution Time: 0.207 ms
    (5 rows)

CodePudding user response:

It easier to look into which tests are added with a feature to have an idea how it is working. Even though I did not find any benchmarks there are some regression test that gives an idea how this optimization working.

The related tests for this feature is in:

https://github.com/postgres/postgres/blob/master/src/test/regress/sql/expressions.sql

The MIN_ARRAY_SIZE_FOR_HASHED_SAOP threshold is 9 so you can observe the effects with 9 elements. Even though the optimization is not visible in explain output you can see its effect with some experimentation.

CREATE TABLE ternary (a int NOT NULL, b text NOT NULL, c float);

INSERT INTO ternary(a, b, c)
SELECT i AS a, md5(i::text) AS b, log(i) AS c
FROM generate_series(1, 10000000, 1) AS i;

ANALYZE ternary;


create function return_text_input(text) returns text as $$
begin
    return $1;
end;
$$ language plpgsql stable;


set jit = off;                  
set max_parallel_workers_per_gather = 0;

explain analyze 
select return_text_input(t.b)  FROM ternary as t where t.b in 
(   '1b0fd9efa5279c4203b7c70233f86db', 
    '252e691406782824eec43d7eadc3d25', 
    'a8d2ec85eaf98407310b72eb73dda24',
    '74687a12d3915d3c4d83f1af7b3683d',
    '596a3d04481816330f07e4f97510c28',
    '47c1b025fa18ea96c33fbb6718688c0',
    '0267aaf632e87a63288a08331f22c7c',
    'b3149ecea4628efd23d2f86e5a72347'  ); 
-- 8 elements - Execution Time: 1007.018 ms in my machine


explain analyze 
select return_text_input(t.b)  FROM ternary as t where t.b in 
(   '1b0fd9efa5279c4203b7c70233f86db', 
    '252e691406782824eec43d7eadc3d25', 
    'a8d2ec85eaf98407310b72eb73dda24',
    '74687a12d3915d3c4d83f1af7b3683d',
    '596a3d04481816330f07e4f97510c28',
    '47c1b025fa18ea96c33fbb6718688c0',
    '0267aaf632e87a63288a08331f22c7c',
    'b3149ecea4628efd23d2f86e5a72347',
    '5d7b9adcbe1c629ec722529dd12e512' );
-- 9 elements - Execution Time: 611.685 ms   in my machine 

As to how to be sure that the hash lookup is used in this case, I am not that familiar with postgresql maybe someone else can show an easy way to see it.

  • Related