So I work with Postgres SQL, and I have a jsonb column with the following structure:
{
"Store1":[
{
"price":5.99,
"seller":"seller"
},
{
"price":56.43,
"seller":"seller"
}
],
"Store2":[
{
"price":45.65,
"seller":"seller"
},
{
"price":44.66,
"seller":"seller"
}
]
}
I have a jsonb like this for every product in the database. I want to run an SQL query that will answer the following question: For each product, is one of the prices in this JSON is bigger/equal/smaller than X?
Basically filter the product to include only the ones who have at least one price that satisfies a mathematical condition. How can I do it efficiently? What's the best way in Postgres to iterate a JSON like this, with a relatively complex inner structure?
Also, if I could control the way the data is structured (to an extent, I can), what changes can I do to make this query more efficient?
Thanks!
CodePudding user response:
Use a json path expression:
WHERE col @@ '$.*[*].price < 20'
or
WHERE col @? '$.*[*] ? (@.price < 20)'
If you need to compare to another column or make the query parameterised, you can either build the jsonpath dynamically
WHERE col @@ format('$.*[*].price < %s', $1)::jsonpath
WHERE col @? format('$.*[*] ? (@.price < %s)', $1)::jsonpath
or you can use the respective function and pass variables as an object:
WHERE jsonb_path_match(col, '$.*[*].price < $limit', jsonb_build_object('limit', $1))
WHERE jsonb_path_exists(col, format('$.*[*] ? (@.price < $limit)', jsonb_build_object('limit', $1))
I admit I had to check my cheat sheet to figure out the right combination of operator and expression. Takeaways:
- if a comparison operator needs to work with multiple values, it generally functions as an
ANY
@@
does not work with? (@ …)
filter expressions since they don't return a boolean,
@?
does not work with predicates since they always return a value (even if it'sfalse
)
CodePudding user response:
What changes can I do to make this query more efficient?
As @jjanes commented on my other answer, the jsonpath match col @@ '$.*[*].price < $limit'
isn't going to be fast and needs to do full table scan, at least for <
and >
. To make a useful index, a different approach is required. An index can only have a single value to compare with, not any number. For that, we need to change the condition from EXISTS(SELECT prices_of(col) WHERE price < $limit)
to (SELECT MIN(prices_of(col))) < $limit
.
With this idea it is possible to build an expression index on the result of a custom immutable function:
CREATE FUNCTION min_price(data jsonb) RETURNS float
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT
RETURN (
SELECT min((offer ->> 'price')::float)
FROM jsonb_each(data) AS entries(name, store),
LATERAL jsonb_array_elements(store) AS elements(offer)
);
CREATE INDEX example_min_data_price_idx ON example (min_price(data));
which you can use as
SELECT * FROM example WHERE min_price(data) < 20;
Looking for rows with a price larger than a certain number requires a separate index on max_price(data)
. If you want to use the index in a JOIN
with more conditions, consider making it a multi-column index.
Looking for row with a price equalling a certain number can be optimised by indexing the jsonb column and using a jsonpath:
CREATE INDEX example_data_idx ON example USING GIN (data jsonb_ops);
SELECT * FROM example WHERE data @@ '$.*[*].price == 20';
SELECT * FROM example WHERE data @? '$.*[*] ? (@.price == 20)';
Unfortunately you can't use jsonb_path_ops
here since that doesn't support the wildcard.