I have data arriving as separate events in JSON form resembling:
{
"id":1234,
"data":{
"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}
}
}
I'd like to unnest the data to essentially have one row per 'packet' (there may be any number of packets).
id | name | value |
---|---|---|
1234 | packet1 | 1 |
1234 | packet2 | 2 |
I've looked at using the unnest function with the various JSON functions but it seems limited to working with arrays. I have not been able to find a way to treat the 'data' field as if it were an array.
At the moment, I cannot change these events to store packets in an array, and ideally the unnesting should be happening within BigQuery.
CodePudding user response:
1. Regular expressions
There might be other ways but you can consider below approach using regular expressions.
WITH sample_table AS (
SELECT """{
"id":1234,
"data":{
"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}
}
}""" AS events
)
SELECT JSON_VALUE(events, '$.id') AS id, name, value
FROM sample_table,
UNNEST (REGEXP_EXTRACT_ALL(events, r'"name":"(\w )"')) name WITH offset
JOIN UNNEST (REGEXP_EXTRACT_ALL(events, r'"value":([0-9.] )')) value WITH offset
USING (offset);
Query results
2. Javascript UDF
or, you might consider below using Javascript UDF.
CREATE TEMP FUNCTION extract_pair(json STRING)
RETURNS ARRAY<STRUCT<name STRING, value STRING>>
LANGUAGE js AS """
result = [];
for (const [key, value] of Object.entries(JSON.parse(json))) {
result.push(value);
}
return result;
""";
WITH sample_table AS (
SELECT """{
"id":1234,
"data":{
"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}
}
}""" AS events
)
SELECT JSON_VALUE(events, '$.id') AS id, obj.*
FROM sample_table, UNNEST(extract_pair(JSON_QUERY(events, '$.data'))) obj;
CodePudding user response:
@Jaytiger's suggestion of unnesting a regex extract led me to the following solution. The example I showed was simplified - there are more fields within the packets. To avoid requiring separate regex for each field name, I used regex to split/extract each individual packet, and then read the JSON.
This iteration doesn't do everything in one step but works when just looking at packets.
with sample_data
AS (SELECT """{"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}}""" as packets)
select
json_value('{'||packet||'}', "$.name") name,
json_value('{'||packet||'}', "$.value") value
from sample_data,
unnest(regexp_extract_all(packets, r'\:{(.*?)\}')) packet