Home > other >  Unnest non-array JSON in BigQuery
Unnest non-array JSON in BigQuery

Time:01-27

I have data arriving as separate events in JSON form resembling:

{
"id":1234,
"data":{
    "packet1":{"name":"packet1", "value":1},
    "packet2":{"name":"packet2", "value":2}
     }
}

I'd like to unnest the data to essentially have one row per 'packet' (there may be any number of packets).

id name value
1234 packet1 1
1234 packet2 2

I've looked at using the unnest function with the various JSON functions but it seems limited to working with arrays. I have not been able to find a way to treat the 'data' field as if it were an array.

At the moment, I cannot change these events to store packets in an array, and ideally the unnesting should be happening within BigQuery.

CodePudding user response:

1. Regular expressions

There might be other ways but you can consider below approach using regular expressions.

WITH sample_table AS (
  SELECT """{
    "id":1234,
    "data":{
      "packet1":{"name":"packet1", "value":1},
      "packet2":{"name":"packet2", "value":2}
     }
  }""" AS events
)
SELECT JSON_VALUE(events, '$.id') AS id, name, value
  FROM sample_table,
       UNNEST (REGEXP_EXTRACT_ALL(events, r'"name":"(\w )"')) name WITH offset
  JOIN UNNEST (REGEXP_EXTRACT_ALL(events, r'"value":([0-9.] )')) value WITH offset
 USING (offset);

Query results

enter image description here

2. Javascript UDF

or, you might consider below using Javascript UDF.

CREATE TEMP FUNCTION extract_pair(json STRING)
RETURNS ARRAY<STRUCT<name STRING, value STRING>>
LANGUAGE js AS """
  result = [];
  for (const [key, value] of Object.entries(JSON.parse(json))) {
    result.push(value);
  }
  return result;
""";

WITH sample_table AS (
  SELECT """{
    "id":1234,
    "data":{
      "packet1":{"name":"packet1", "value":1},
      "packet2":{"name":"packet2", "value":2}
     }
  }""" AS events
)
SELECT JSON_VALUE(events, '$.id') AS id, obj.*
  FROM sample_table, UNNEST(extract_pair(JSON_QUERY(events, '$.data'))) obj;

CodePudding user response:

@Jaytiger's suggestion of unnesting a regex extract led me to the following solution. The example I showed was simplified - there are more fields within the packets. To avoid requiring separate regex for each field name, I used regex to split/extract each individual packet, and then read the JSON.

This iteration doesn't do everything in one step but works when just looking at packets.

with sample_data
AS (SELECT """{"packet1":{"name":"packet1", "value":1},
               "packet2":{"name":"packet2", "value":2}}""" as packets)

select
    json_value('{'||packet||'}', "$.name") name,
    json_value('{'||packet||'}', "$.value") value
from sample_data,
unnest(regexp_extract_all(packets, r'\:{(.*?)\}')) packet
  • Related