Home > Mobile >  Is it possible to index the position of an array column in PostgreSQL?
Is it possible to index the position of an array column in PostgreSQL?

Time:12-18

Let's say I want to find rows in the table my_table that have the value 5 at the first position of the array column my_array_column. To prepare the table, I executed the following statements:

CREATE TABLE my_table (
    id serial primary key,
    my_array_column  integer[]
);

CREATE INDEX my_table_my_array_column_index on "my_table" USING GIN ("my_array_column");

SET enable_seqscan TO off;

INSERT INTO my_table (my_array_column) VALUES ('{5,7,10}');

Now, the query can look like this:

select * from my_table where my_array_column[1] = 5;

This works, but it doesn't use the created GIN index. Is it possible to search for the value 5 at a specific position with an index?

CodePudding user response:

I want to find rows in the table my_table that have the value 5 at the first position of the array column

A partial index would be most efficient for that definition:

CREATE INDEX my_table_my_array_special_idx ON my_table ((true))
WHERE my_array_column[1] = 5;

If only a small fraction of rows qualifies, a partial index is accordingly smaller. Plus, the actual index column only occupies minimum space (typically 8 bytes). And, on top of that, Postgres 13 or later can apply index deduplication to make the index much smaller, yet.

Once the index is fully cached, its small size does not make it much faster, but still.
And most writes do not have to manipulate the index, which may be the most important benefit, depending on the workload.

Oh, and Postgres collects statistics for a partial index. So you can expect the query planner to make a fully educated choice when that index is involved.

Related:

It's applicable when the query repeats the same condition.

Typically, you have something useful as index field on top of your declared purpose. But if you don't, just use any small constant - true in my example, but anything < 8 bytes is equally good.

Minor disclaimer: The "first position" in a Postgres array does not necessarily have index 1. If non-standard array indexes are possible, consider:

...
WHERE (my_array_column[:])[1] = 5;

In index and queries.

See:

CodePudding user response:

If you always look at the first position a regular B-Tree index will do:

create index on my_table (my_array_column[1]);

If you don't know the position, then a GIN index is indeed needed, but you need to use an operator that is supported by a gin index, that would be e.g. the @> operator. But for that you need to use a different query:

select *
from my_table
where my_array_column @> array[5];

That would find all rows where the array column contains the value 5.


But you should head the advice given in the manual regarding the use of arrays:

Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

CodePudding user response:

You can index just the first position. You need an extra set of parentheses in the create statement to do that:

create index on my_table ((my_array_column[1]));

Or you could augment your query to work with your gin index, on the theory that an array can't have the first element be 5 unless at least one element is 5.

select * from my_table where my_array_column[1] = 5 and my_array_column @> ARRAY[5];

Of course this won't be very efficient if a lot of your arrays contain 5, but in some other spot in the array. It would have to recheck all of those "false matches" to eliminate them. So if you only care about the first element, the first index I showed is better. (Of course, if you only care about the first element, why use an array to start with?)

  • Related