I have a two part question
We have a PostgreSQL table with a jsonb column. The values in jsonb are valid jsons, but they are such that for some rows a node would come in as an array whereas for others it will come as an object.
for example, the json we receive could either be like this ( node4 I just an object )
"node1": {
"node2": {
"node3": {
"node4": {
"attr1": "7d181b05-9c9b-4759-9368-aa7a38b0dc69",
"attr2": "S1",
"UserID": "WebServices",
"attr3": "S&P 500*",
"attr4": "EI",
"attr5": "0"
}
}
}
}
Or like this ( node4 is an array )
"node1": {
"node2": {
"node3": {
"node4": [
{
"attr1": "7d181b05-9c9b-4759-9368-aa7a38b0dc69",
"attr2": "S1",
"UserID": "WebServices",
"attr3": "S&P 500*",
"attr4": "EI",
"attr5": "0"
},
{
"attr1": "7d181b05-9c9b-4759-9368-aa7a38b0dc69",
"attr2": "S1",
"UserID": "WebServices",
"attr3": "S&P 500*",
"attr4": "EI",
"attr5": "0"
}
]
}
}
}
And I have to write a jsonpath query to extract, for example, attr1, for each PostgreSQL row containing this json. I would like to have just one jsonpath query that would always work irrespective of whether the node is object or array. So, I want to use a path like below, assuming, if it is an array, it will return the value for all indices in that array.
jsonb_path_query(payload, '$.node1.node2.node3.node4[*].attr1')#>> '{}' AS "ATTR1"
I would like to avoid checking whether the type in array or object and then run a separate query for each and do a union.
Is it possible?
A sub-question related to above - Since I needed the output as text without the quotes, and somewhere I saw to use #>> '{}'
- so I tried that and it is working, but can someone explain, how that works?
The second part of the question is - the incoming json can have multiple sets of nested arrays and the json and the number of nodes is huge. So other part I would like to do is flatten the json into multiple rows. The examples I found were one has to identify each level and either use cross join or unnest. What I was hoping is there is a way to flatten a node that is an array, including all of the parent information, without knowing which, if any, if its parents are arrays or simple object. Is this possible as well?
Update
I tried to look at the documentation and tried to understand the #>> '{}'
construct, and then I came to realise that '{}' is the right hand operand for the #>> operator which takes a path and in my case the path is the current attribute value hence {}. Looking at examples that had non-empty single attribute path helped me realise that.
Thank you
CodePudding user response:
You can use a "recursive term" in the JSON path expression:
select t.some_column,
p.attr1
from the_table t
cross join jsonb_path_query(payload, 'strict $.**.attr1') as p(attr1)
Note that the strict
modifier is required, otherwise, each value will be returned multiple times.
This will return one row for each key attr1
found in any level of the JSON structure.
For the given sample data, this would return:
attr1
--------------------------------------
"7d181b05-9c9b-4759-9368-aa7a38b0dc69"
"7d181b05-9c9b-4759-9368-aa7a38b0dc69"
"7d181b05-9c9b-4759-9368-aa7a38b0dc69"
CodePudding user response:
"I would like to avoid checking whether the type in array or object and then run a separate query for each and do a union. Is it possible?"
Yes it is and your jsonpath query works fine in both cases either when node4 is a jsonb object or when it is a jsonb array because the jsonpath wildcard array accessor [*]
also works with a jsonb object in the lax mode
which is the default behavior (but not in the strict mode
see the manual). See the test results in dbfiddle.
"I saw to use #>> '{}' - so I tried that and it is working, but can someone explain, how that works?"
The output of the jsonb_path_query
function is of type jsonb
, and when the result is a jsonb string, then it is automatically displayed with double quotes "
in the query results. The operator #>>
converts the output into the text type which is displayed without "
in the query results and the associated text array '{}'
just point at the root of the passed jsonb data.
" the incoming json can have multiple sets of nested arrays and the json and the number of nodes is huge. So other part I would like to do is flatten the json into multiple rows"
you can refer to the answer of a_horse_with_no_name using the recursive wildcard member accessor .**