Add and use index for jsonb with nested arrays-CodePudding

In my PostgreSQL 11.11 I have one jsonb column that holds objects like this:

{
   "dynamicFields":[
      {
         "name":"200",
         "hidden":false,
         "subfields":[
            {
               "name":"a",
               "value":"Subfield a"
            },
            {
               "name":"b",
               "value":"Subfield b"
            }           
         ]
      }
   ]
}

dynamicFields is an array and subfields is also an array and I having performance issues when hitting selects like this:

select *
from my_table a
cross join lateral jsonb_array_elements(jsonb_column -> 'dynamicFields') df
cross join lateral jsonb_array_elements(df -> 'subfields') sf
where df ->> 'name' = '200' and sf ->> 'name' = 'a'

The performance issues live mostly in the subfield. I have already added an index like this:

CREATE INDEX idx_my_index ON my_table USING gin ((marc->'dynamicFields') jsonb_path_ops);

How can I add an index for the subfields inside the dynamicFields?
The query above is just one example, I use it a lot in joins with other tables in the database. And I also know the @> operator.

CodePudding user response：

Index is for enhancing query performance on tables. Index can only be done on table columns and considering those columns that will be used in table join and where clause makes the indexing significant. For a jsonb column you can use create index on table_name using gin(column_name, jsonb_path_ops).

CodePudding user response：

You already have a very good index to support your query.
Make use of it with the jsonb "contains" operator" @>:

SELECT *
FROM   my_table
WHERE  marc->'dynamicFields' @> '[{"name": "200", "subfields":[{"name": "a"}]}]';

db<>fiddle here

Carefully match the structure of the JSON object in the table. Then rows are selected cheaply using the index.
You can then extract whatever parts you need from qualifying rows.

Detailed instructions:

Index for finding an element in a JSON array

If one of the filters is very selective on its own, it might be faster to split the two conditions like in your original. Either way, both variants should be fast:

SELECT *
FROM   my_table
WHERE  marc->'dynamicFields' @> '[{"name": "200"}]'
AND    marc->'dynamicFields' @> '[{"subfields":[{"name": "a"}]}]';