Wondering around https://www.postgresql.org/docs/release/14.0/
- Allow hash lookup for IN clauses with many constants (James Coleman, David Rowley)
example where empid in (1,2,3)
, Can I understand it this way: PostgreSQL will hash 1
, 2
, 3
into 3 different hash value, then for each value, compare with table's hashup, if match then grap the matched row.
- How do I know the query used hash lookup ?
- Previously How does Postgresql do lookup for IN clause with many constants?
- not that familiar with C,but want to see the code. Then I search git commit message https://prnt.sc/S15_5xc1cbBn, I cannot found relevant info, How can I search relevant info.
The following is QUERY PLAN for explain analyze select * from emp where empid in (1,2,3);
QUERY PLAN -----------------------------------------------------------------------------------------------
Seq Scan on emp (cost=0.00..1.12 rows=3 width=190) (actual time=0.016..0.017 rows=3 loops=1)
Filter: (empid = ANY ('{1,2,3}'::integer[]))
Rows Removed by Filter: 4
Planning Time: 1.054 ms
Execution Time: 0.207 ms
(5 rows)
CodePudding user response:
It easier to look into which tests are added with a feature to have an idea how it is working. Even though I did not find any benchmarks there are some regression test that gives an idea how this optimization working.
The related tests for this feature is in:
https://github.com/postgres/postgres/blob/master/src/test/regress/sql/expressions.sql
The MIN_ARRAY_SIZE_FOR_HASHED_SAOP
threshold is 9 so you can observe the effects with 9 elements. Even though the optimization is not visible in explain
output you can see its effect with some experimentation.
CREATE TABLE ternary (a int NOT NULL, b text NOT NULL, c float);
INSERT INTO ternary(a, b, c)
SELECT i AS a, md5(i::text) AS b, log(i) AS c
FROM generate_series(1, 10000000, 1) AS i;
ANALYZE ternary;
create function return_text_input(text) returns text as $$
begin
return $1;
end;
$$ language plpgsql stable;
set jit = off;
set max_parallel_workers_per_gather = 0;
explain analyze
select return_text_input(t.b) FROM ternary as t where t.b in
( '1b0fd9efa5279c4203b7c70233f86db',
'252e691406782824eec43d7eadc3d25',
'a8d2ec85eaf98407310b72eb73dda24',
'74687a12d3915d3c4d83f1af7b3683d',
'596a3d04481816330f07e4f97510c28',
'47c1b025fa18ea96c33fbb6718688c0',
'0267aaf632e87a63288a08331f22c7c',
'b3149ecea4628efd23d2f86e5a72347' );
-- 8 elements - Execution Time: 1007.018 ms in my machine
explain analyze
select return_text_input(t.b) FROM ternary as t where t.b in
( '1b0fd9efa5279c4203b7c70233f86db',
'252e691406782824eec43d7eadc3d25',
'a8d2ec85eaf98407310b72eb73dda24',
'74687a12d3915d3c4d83f1af7b3683d',
'596a3d04481816330f07e4f97510c28',
'47c1b025fa18ea96c33fbb6718688c0',
'0267aaf632e87a63288a08331f22c7c',
'b3149ecea4628efd23d2f86e5a72347',
'5d7b9adcbe1c629ec722529dd12e512' );
-- 9 elements - Execution Time: 611.685 ms in my machine
As to how to be sure that the hash lookup is used in this case, I am not that familiar with postgresql maybe someone else can show an easy way to see it.