With standard fields, like id, this works perfectly. But I am not finding a way to parse the custom fields where the structure is
"custom_fields": [
{
"id": 57852188,
"value": ""
},
{
"id": 57522467,
"value": ""
},
{
"id": 57522487,
"value": ""
}
]
The general format that I have been using is:
Select v:id,v:updatedat
from zd_tickets
updated data:
{
"id":151693,
"brand_id": 36000,
"created_at": "2022-0523T19:26:35Z",
"custom_fields": [
{ "id": 57866008, "value": false },
{ "id": 360022282754, "value": "" },
{ "id": 80814087, "value": "NC" } ],
"group_id": 36000770
}
CodePudding user response:
If you want to select all repeating elements you will need to use FLATTEN, otherwise you can use standard notation. This is all documented here: https://docs.snowflake.com/en/user-guide/querying-semistructured.html#retrieving-a-single-instance-of-a-repeating-element
CodePudding user response:
So using this CTE to access the data in a way that look like a table:
with data(json) as (
select parse_json(column1) from values
('{
"id":151693,
"brand_id": 36000,
"created_at": "2022-0523T19:26:35Z",
"custom_fields": [
{ "id": 57866008, "value": false },
{ "id": 360022282754, "value": "" },
{ "id": 80814087, "value": "NC" } ],
"group_id": 36000770
} ')
)
SQL to unpack the top level items, as you have shown you have working:
select
json:id::number as id
,json:brand_id::number as brand_id
,try_to_timestamp(json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
,json:custom_fields as custom_fields
from data;
gives:
ID | BRAND_ID | CREATED_AT | CUSTOM_FIELDS |
---|---|---|---|
151693 | 36000 | 2022-05-23 19:26:35.000 | [ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ] |
So now how to tackle that json/array of custom_fields..
Well if you only ever have 3 values, and the order is always the same..
select
to_array(json:custom_fields) as custom_fields_a
,custom_fields_a[0] as field_0
,custom_fields_a[1] as field_1
,custom_fields_a[2] as field_2
from data;
gives:
CUSTOM_FIELDS_A | FIELD_0 | FIELD_1 | FIELD_2 |
---|---|---|---|
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ] | { "id": 57866008, "value": false } | { "id": 360022282754, "value": "" } | { "id": 80814087, "value": "NC" } |
so we can use flatten to access those objects, which makes "more rows"
select
d.json:id::number as id
,d.json:brand_id::number as brand_id
,try_to_timestamp(d.json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
,f.*
from data as d
,table(flatten(input=>json:custom_fields)) f
ID | BRAND_ID | CREATED_AT | SEQ | KEY | PATH | INDEX | VALUE | THIS |
---|---|---|---|---|---|---|---|---|
151693 | 36000 | 2022-05-23 19:26:35.000 | 1 | [0] | 0 | { "id": 57866008, "value": false } | [ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ] | |
151693 | 36000 | 2022-05-23 19:26:35.000 | 1 | [1] | 1 | { "id": 360022282754, "value": "" } | [ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ] | |
151693 | 36000 | 2022-05-23 19:26:35.000 | 1 | [2] | 2 | { "id": 80814087, "value": "NC" } | [ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ] |
So we can pull out know values (a manual PIVOT)
select
d.json:id::number as id
,d.json:brand_id::number as brand_id
,try_to_timestamp(d.json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
,max(iff(f.value:id=80814087, f.value:value::text, null)) as v80814087
,max(iff(f.value:id=360022282754, f.value:value::text, null)) as v360022282754
,max(iff(f.value:id=57866008, f.value:value::text, null)) as v57866008
from data as d
,table(flatten(input=>json:custom_fields)) f
group by 1,2,3, f.seq
grouping by the f.seq
means if you have many "rows" of input these will be kept apart, even if they share common values for 1,2,3
gives:
ID | BRAND_ID | CREATED_AT | V80814087 | V360022282754 | V57866008 |
---|---|---|---|---|---|
151693 | 36000 | 2022-05-23 19:26:35.000 | NC | <empty string> | false |
Now if you do not know the names of the values, there is no way short of dynamic SQL and double parsing to turns rows into columns.