I have a table reports
with a jsonb
field items
. For each item in the report, it contains a field called score
(other fields are unrelated to this question).
reports
id, items
1, [{"score": 1.0}, {"score": 2.0}]
2, [{"score": 3.0}, {"score": 9.0}]
Is it possible to add index so that it's fast for the query "give me all the reports with at least one item.score > 5"?:
SELECT id FROM reports where items @? '$[*] ? (@.score > 5.0)';
// will return id = 2
I have tried gin index on items
but it only works for exact matches such as @.score == 5.0
CodePudding user response:
https://www.postgresql.org/docs/current/indexes-index-only-scans.html
GIN INDEX cannot index-only-scan.
https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
The default GIN operator class for jsonb supports queries with the key-exists operators ?, ?| and ?&, the containment operator @>, and the jsonpath match operators @? and @@.
So @? is supported.
demo
CREATE INDEX jidx ON test1 USING gin (data jsonb_path_ops);
SET enable_seqscan = OFF;
EXPLAIN (
COSTS OFF
)
SELECT
*
FROM
test1
WHERE
data @ ? '$[*] ? (@.score > 5.0)';
Further tutorial: https://pganalyze.com/blog/gin-index
CodePudding user response:
You can define a function which returns the max score, then index that function and rewrite the query to use it.
create function max_score(jsonb) returns float language sql as $$
select max((value->'score')::float) from jsonb_array_elements($1);
$$ immutable;
create index on reports (max_score(items));
SELECT id FROM reports where max_score(items) > 5;
Obviously the max score being >5 is identical to at least one score being >5. This type of reduction won't be possible in all cases, so you should still consider a better data model.