Say I have a table like this:
create table mytable (
mycol text[]
)
And I want to select all rows where mycol
contains "hello"
. I can think of two ways to do this:
SELECT * FROM mytable WHERE 'hello'=any(mycol);
--or
SELECT * FROM mytable WHERE mycol && '{hello}';
I know for the second option I can use GIN indexes (which allow for array options), and I'm pretty sure for the first I would use a BTREE (or maybe a HASH?).
So my question is this: If I only need to check the membership of a single item, which method with what index is most efficient for a table with millions of rows?
CodePudding user response:
The 2nd, with a GIN index.
The first one can't use either a btree or a hash index in an efficient way. It can use a btree index, but only as a skinny copy of the table.
You might be confusing this with the reverse situation, where the column is before the ANY (and is a scalar) and the literal is inside the ANY. This one can use the btree.
SELECT * FROM mytable_scalar WHERE mycol =any('{hello,goodbye}');
A good way to see how efficient something will be is often just to try it with fake data but of a vaguely realistic size:
insert into mytable select ARRAY[md5(random()::text),md5(random()::text)] from generate_series(1,1500000);