I have a postgres table that is created using benthos.dev/Golang and cannot change how it is created. This table has json data inserted into columns as strings. A separate benthos yaml will query this using postgres. For the select statement from that table, the intent is to find the element with a given name for "system" and return the value of the corresponding "value". (not the value of system, the value of the element called "value") All responses need to be on the same row. Here is a mock up Column name : Suigeneris
name | Suigeneris |
---|---|
baz | [{"system":"https://alpha.com/externalid","use":"home","value":"JJO5H0001"},{"system":"http://gee.org/berlin/uni/Suigeneris-num","use":"usual","value":"2122206077"}] |
The desired output from a select is
name | Suigeneris | Sui-alpha | Sui-berlin |
---|---|---|---|
baz | [{"system":"https://alpha.com/externalid","use":"home","value":"JJO5H0001"},{"system":"http://gee.org/berlin/uni/Suigeneris-num","use":"usual","value":"2122206077"}] | JJO5H0001 | 2122206077 |
I've tried two approaches and am using this query for advice on using postgres to select the desired data. I tried doing this in postgres by calculating the position of the {} and then using that to separate but this creates a difficult to read and easy to mess up long query.
example:
select substring (Suigeneris from (position ( '{' IN Suigeneris )) for (position ( '}' IN Suigeneris ))-(position ( '{' IN Suigeneris )))
from table
I only tried with the first position.
I also tried with regex_match and this works but does require that the order of the names is always the same (e.g. "system is first). It looks like this is true but not sure what would happen if the source system decided to switch it.
SELECT (regexp_match(identifier,'("system":"http:\/\/gee.org\/berlin\/uni\/Suigeneris-num.*)(?=})')) as npi
FROM table
However, the problem is that I have to put that query into a YAML file and get 'unexpected scalar at node end' - I have tried escaping the double quotes with single '"'and double """ quotes. Looking at the Yaml documentation, I am not understanding a solution. The postgres query will go into a yaml file.
Note again that the intention is to read one name:value pair and extract the value from a different name:value pair in that object. For "system": "https...alpha..." extract the value of "value". Both solutions extract the whole sequence relating to one value with the intention of extracting the desired value in the next step. But it would be ideal to get the value as described. There is no reason to keep the entire object within the {}.
An option might be to get the lookahead match in regex to pick up the first bracket before the "berlin" text without specifying the entire string. I have failed to find a way to position the regex at say "berlin" and find the first { before and after that as a match. That might allow not having the full https address and might solve the yaml scalar problem.
Or there might be a better way using "PostgreSQL 14.2
CodePudding user response:
You can use a JSON path query for that:
select name,
"unique",
jsonb_path_query_first("unique", '$[*] ? (@.system == "https://alpha.com/externalid").value') #>> '{}' as unique_alpha,
jsonb_path_query_first("unique", '$[*] ? (@.system == "http://gee.org/berlin/uni/unique-num").value') #>> '{}' as unique_berlin
from the_table
jsonb_path_query_first
returns a jsonb
value. Unfortunately there is no direct cast to text from jsonb
. For that #>> '{}'
is used.
Note that unique
is a really bad choice for a column name, as that is a reserved keyword.