In Postgres, how can I efficiently filter using the inner numbers of this jsonb structure?-CodePudding

So I work with Postgres SQL, and I have a jsonb column with the following structure:

{
   "Store1":[
      {
         "price":5.99,
         "seller":"seller"
      },
      {
         "price":56.43,
         "seller":"seller"
      }
   ],
   "Store2":[
      {
         "price":45.65,
         "seller":"seller"
      },
      {
         "price":44.66,
         "seller":"seller"
      }
   ]
}

I have a jsonb like this for every product in the database. I want to run an SQL query that will answer the following question: For each product, is one of the prices in this JSON is bigger/equal/smaller than X?

Basically filter the product to include only the ones who have at least one price that satisfies a mathematical condition. How can I do it efficiently? What's the best way in Postgres to iterate a JSON like this, with a relatively complex inner structure?

Also, if I could control the way the data is structured (to an extent, I can), what changes can I do to make this query more efficient?

Thanks!

CodePudding user response：

Use a json path expression:

WHERE col @@ '$.*[*].price < 20'

WHERE col @? '$.*[*] ? (@.price < 20)'

If you need to compare to another column or make the query parameterised, you can either build the jsonpath dynamically

WHERE col @@ format('$.*[*].price < %s', $1)::jsonpath
WHERE col @? format('$.*[*] ? (@.price < %s)', $1)::jsonpath

or you can use the respective function and pass variables as an object:

WHERE jsonb_path_match(col, '$.*[*].price < $limit', jsonb_build_object('limit', $1))
WHERE jsonb_path_exists(col, format('$.*[*] ? (@.price < $limit)', jsonb_build_object('limit', $1))

I admit I had to check my cheat sheet to figure out the right combination of operator and expression. Takeaways:

if a comparison operator needs to work with multiple values, it generally functions as an ANY
@@ does not work with ? (@ …) filter expressions since they don't return a boolean,
@? does not work with predicates since they always return a value (even if it's false)

CodePudding user response：

What changes can I do to make this query more efficient?

As @jjanes commented on my other answer, the jsonpath match col @@ '$.*[*].price < $limit' isn't going to be fast and needs to do full table scan, at least for < and >. To make a useful index, a different approach is required. An index can only have a single value to compare with, not any number. For that, we need to change the condition from EXISTS(SELECT prices_of(col) WHERE price < $limit) to (SELECT MIN(prices_of(col))) < $limit.

With this idea it is possible to build an expression index on the result of a custom immutable function:

CREATE FUNCTION min_price(data jsonb) RETURNS float
  LANGUAGE SQL
  IMMUTABLE
  RETURNS NULL ON NULL INPUT
  RETURN (
    SELECT min((offer ->> 'price')::float)
    FROM jsonb_each(data) AS entries(name, store),
    LATERAL jsonb_array_elements(store) AS elements(offer)
  );

CREATE INDEX example_min_data_price_idx ON example (min_price(data));

which you can use as

SELECT * FROM example WHERE min_price(data) < 20;

Looking for rows with a price larger than a certain number requires a separate index on max_price(data). If you want to use the index in a JOIN with more conditions, consider making it a multi-column index.

Looking for row with a price equalling a certain number can be optimised by indexing the jsonb column and using a jsonpath:

CREATE INDEX example_data_idx ON example USING GIN (data jsonb_ops);
SELECT * FROM example WHERE data @@ '$.*[*].price == 20';
SELECT * FROM example WHERE data @? '$.*[*] ? (@.price == 20)';

Unfortunately you can't use jsonb_path_ops here since that doesn't support the wildcard.