Home > front end >  Most efficient way to index an array column to allow for selecting rows containing a value
Most efficient way to index an array column to allow for selecting rows containing a value

Time:03-23

Say I have a table like this:

create table mytable (
   mycol text[]
)

And I want to select all rows where mycol contains "hello". I can think of two ways to do this:

SELECT * FROM mytable WHERE 'hello'=any(mycol);
--or
SELECT * FROM mytable WHERE mycol && '{hello}';

I know for the second option I can use GIN indexes (which allow for array options), and I'm pretty sure for the first I would use a BTREE (or maybe a HASH?).

So my question is this: If I only need to check the membership of a single item, which method with what index is most efficient for a table with millions of rows?

CodePudding user response:

The 2nd, with a GIN index.

The first one can't use either a btree or a hash index in an efficient way. It can use a btree index, but only as a skinny copy of the table.

You might be confusing this with the reverse situation, where the column is before the ANY (and is a scalar) and the literal is inside the ANY. This one can use the btree.

SELECT * FROM mytable_scalar WHERE mycol =any('{hello,goodbye}');

A good way to see how efficient something will be is often just to try it with fake data but of a vaguely realistic size:

insert into mytable select ARRAY[md5(random()::text),md5(random()::text)] from generate_series(1,1500000);
  • Related