I have a python script that polls an API and stores the response in a BQ table with layered, stringified JSON field that looks like so:
"{""inputs"": {""Layer1"": {""Layer2"": {""Layer3"": {""item1"": 0.7, ""item2"": 10.0, ""item3"": 0.14}}}}"
I created the following query to unnest:
ro.id,
json_extract_scalar(s,
'$.item1') AS item1,
json_extract_scalar(s,
'$.item2') AS item2,
json_extract_scalar(s,
'$.item3') AS item3
FROM
`project.dataset.table` ro
LEFT JOIN
UNNEST(json_extract_array(response)) AS r
LEFT JOIN
UNNEST (json_extract_array("Layer1")) AS o
LEFT JOIN
UNNEST (json_extract_array("Layer2")) AS sc
LEFT JOIN
UNNEST (json_extract_array("Layer3")) AS s
The query runs and builds the table as expected, but all of the data from the unnested fields is null.
I do not have control over how the script stores the data, so I can only leverage SQL to solve this problem. Are the two sets of double quotes my main issue here? If so, what is the best way to address that?
CodePudding user response:
Consider below approach
SELECT
json_extract_scalar(items,'$.item1') AS item1,
json_extract_scalar(items,'$.item2') AS item2,
json_extract_scalar(items,'$.item3') AS item3,
FROM `project.dataset.table` ro,
UNNEST([struct(json_extract(response, '$.inputs.Layer1.Layer2.Layer3') as items)])
if applied to sample data in your question - output is