[
{
"SnapshotDate": 20220224,
"EquityUSD": 5530.22,
"BalanceUSD": 25506.95,
"jsonTransactions": "[{\"TransactionDate\":20220224,\"AccountTransactionID\":144155779,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000},{\"TransactionDate\":20220224,\"AccountTransactionID\":144155791,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000}]"
}
]
Can somenone help me to extract this json string on bigquery. I can seem to get JSON_EXTRACT to work as it does not have a root element
CodePudding user response:
You might have to build a temp table to do this.
This first create statement would take a denormalized table convert it to a table with an array of structs.
The second create statement would take that temp table and embed the array into a (array of) struct(s).
You could remove the internal struct from the first query, and array wrapper the second query to build a strict struct of arrays. But this should be flexibe enough that you can create an array of structs, a struct of arrays or any combination of the two as many times as you want up to the 15 levels deep that BigQuery allows you to max out at.
The final outcome of this could would be a table with one column (column1) of a standard datatype, as well as an array of structs called OutsideArrayOfStructs. That Struct has two columns of "standard" datatypes, as well as an array of structs called InsideArrayOfStructs.
CREATE OR REPLACE TABLE dataset.tempTable as (
select
column1,
column2,
column3,
ARRAY_AGG(
STRUCT(
ArrayObjectColumn1,
ArrayObjectColumn2,
ArrayObjectColumn3
)
) as InsideArrayOfStructs
FROM
sourceDataset.sourceTable
GROUP BY
column1,
column2,
column3 )
CREATE OR REPLACE TABLE dataset.finalTable as (
select
column1,
ARRAY_AGG(
STRUCT(
column2,
column3,
InsideArrayOfStructs
)
) as OutsideArrayOfStructs
FROM
dataset.tempTable
GROUP BY
Column1 )
CodePudding user response:
The double quotes in jsonTransactions
are making the JSON invalid. JSON_EXTRACT_SCALAR(json_data, "$[0].jsonTransactions")
returns [{
because the first pair of double quotes enclose [{
. To circumvent this, I used regex to remove the double quotes of the jsonTransactions
value. Now, the inner JSON string is considered an array.
After regex replacement, the outermost quotes have been removed as shown below. I replaced "[
and ]"
with [
and ]
respectively in the JSON string.
"jsonTransactions": [{"TransactionDate":20220224,"AccountTransactionID":144155779,"TransactionType":"Deposit","AmountUSD":2000},{"TransactionDate":20220224,"AccountTransactionID":144155791,"TransactionType":"Deposit","AmountUSD":2000}]
Consider the below query for your requirement. The JSON path for AmountUSD
will be "$[0].jsonTransactions[0].AmountUSD"
.
WITH
sample_table AS (
SELECT
'[{"SnapshotDate": 20220224,"EquityUSD": 5530.22,"BalanceUSD": 25506.95,"jsonTransactions": "[{\"TransactionDate\":20220224,\"AccountTransactionID\":144155779,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000},{\"TransactionDate\":20220224,\"AccountTransactionID\":144155791,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000}]"}]'
AS json_data) as json_extracted
SELECT
JSON_EXTRACT(REGEXP_REPLACE(REGEXP_REPLACE(json_data, r'"\[', '['), r'\]"', ']'),
'$[0].jsonTransactions')
FROM
sample_table;
Output:
As you had mentioned in the comments section, it is better to store the JSON itself in a more accessible format (one valid JSON object) instead of nesting JSON strings.