Home > Software engineering >  Is there a BigQuery function to extract a nested JSON?
Is there a BigQuery function to extract a nested JSON?

Time:03-11

[
  {
    "SnapshotDate": 20220224,
    "EquityUSD": 5530.22,
    "BalanceUSD": 25506.95,
    "jsonTransactions": "[{\"TransactionDate\":20220224,\"AccountTransactionID\":144155779,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000},{\"TransactionDate\":20220224,\"AccountTransactionID\":144155791,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000}]"
  }
]

Can somenone help me to extract this json string on bigquery. I can seem to get JSON_EXTRACT to work as it does not have a root element

CodePudding user response:

You might have to build a temp table to do this.

This first create statement would take a denormalized table convert it to a table with an array of structs.

The second create statement would take that temp table and embed the array into a (array of) struct(s).

You could remove the internal struct from the first query, and array wrapper the second query to build a strict struct of arrays. But this should be flexibe enough that you can create an array of structs, a struct of arrays or any combination of the two as many times as you want up to the 15 levels deep that BigQuery allows you to max out at.

The final outcome of this could would be a table with one column (column1) of a standard datatype, as well as an array of structs called OutsideArrayOfStructs. That Struct has two columns of "standard" datatypes, as well as an array of structs called InsideArrayOfStructs.

 CREATE OR REPLACE TABLE dataset.tempTable as (
     select 
         column1,
         column2,
         column3,
         ARRAY_AGG(
                 STRUCT(
                     ArrayObjectColumn1,
                     ArrayObjectColumn2,
                     ArrayObjectColumn3
                 )
             ) as InsideArrayOfStructs
     FROM
         sourceDataset.sourceTable
     GROUP BY 
         column1,
         column2,
         column3 )

 CREATE OR REPLACE TABLE dataset.finalTable as (
     select 
         column1,
         ARRAY_AGG(
                 STRUCT(
                     column2,
                     column3,
                     InsideArrayOfStructs
                 )
        ) as OutsideArrayOfStructs
     FROM
         dataset.tempTable
     GROUP BY
         Column1 )

CodePudding user response:

The double quotes in jsonTransactions are making the JSON invalid. JSON_EXTRACT_SCALAR(json_data, "$[0].jsonTransactions") returns [{ because the first pair of double quotes enclose [{. To circumvent this, I used regex to remove the double quotes of the jsonTransactions value. Now, the inner JSON string is considered an array.

After regex replacement, the outermost quotes have been removed as shown below. I replaced "[ and ]" with [ and ] respectively in the JSON string.

"jsonTransactions": [{"TransactionDate":20220224,"AccountTransactionID":144155779,"TransactionType":"Deposit","AmountUSD":2000},{"TransactionDate":20220224,"AccountTransactionID":144155791,"TransactionType":"Deposit","AmountUSD":2000}]

Consider the below query for your requirement. The JSON path for AmountUSD will be "$[0].jsonTransactions[0].AmountUSD".

WITH
  sample_table AS (
  SELECT
    '[{"SnapshotDate": 20220224,"EquityUSD": 5530.22,"BalanceUSD": 25506.95,"jsonTransactions": "[{\"TransactionDate\":20220224,\"AccountTransactionID\":144155779,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000},{\"TransactionDate\":20220224,\"AccountTransactionID\":144155791,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000}]"}]'
  AS json_data) as json_extracted
SELECT
  JSON_EXTRACT(REGEXP_REPLACE(REGEXP_REPLACE(json_data, r'"\[', '['), r'\]"', ']'),
    '$[0].jsonTransactions')
FROM
  sample_table;

Output:

enter image description here

As you had mentioned in the comments section, it is better to store the JSON itself in a more accessible format (one valid JSON object) instead of nesting JSON strings.

  • Related