I'm attempting to parse json data from zendesk using v: structure-CodePudding

With standard fields, like id, this works perfectly. But I am not finding a way to parse the custom fields where the structure is

"custom_fields": [
    {
      "id": 57852188,
      "value": ""
    },
    {
      "id": 57522467,
      "value": ""
    },
    {
      "id": 57522487,
      "value": ""
    }
]

The general format that I have been using is:

 Select v:id,v:updatedat
 from zd_tickets

updated data:

{ 
    "id":151693, 
    "brand_id": 36000, 
    "created_at": "2022-0523T19:26:35Z", 
    "custom_fields": [
        { "id": 57866008, "value": false }, 
        { "id": 360022282754, "value": "" },
        { "id": 80814087, "value": "NC" } ], 
    "group_id": 36000770 
}

CodePudding user response：

If you want to select all repeating elements you will need to use FLATTEN, otherwise you can use standard notation. This is all documented here: https://docs.snowflake.com/en/user-guide/querying-semistructured.html#retrieving-a-single-instance-of-a-repeating-element

CodePudding user response：

So using this CTE to access the data in a way that look like a table:

with data(json) as (
    select parse_json(column1) from values
    ('{ 
    "id":151693, 
    "brand_id": 36000, 
    "created_at": "2022-0523T19:26:35Z", 
    "custom_fields": [
        { "id": 57866008, "value": false }, 
        { "id": 360022282754, "value": "" },
        { "id": 80814087, "value": "NC" } ], 
    "group_id": 36000770 
} ')
)

SQL to unpack the top level items, as you have shown you have working:

select
    json:id::number as id
    ,json:brand_id::number as brand_id
    ,try_to_timestamp(json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
    ,json:custom_fields as custom_fields
from data;

gives:

ID	BRAND_ID	CREATED_AT	CUSTOM_FIELDS
151693	36000	2022-05-23 19:26:35.000	[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]

So now how to tackle that json/array of custom_fields..

Well if you only ever have 3 values, and the order is always the same..

select
    to_array(json:custom_fields) as custom_fields_a
    ,custom_fields_a[0] as field_0
    ,custom_fields_a[1] as field_1
    ,custom_fields_a[2] as field_2
from data;

gives:

CUSTOM_FIELDS_A	FIELD_0	FIELD_1	FIELD_2
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]	{ "id": 57866008, "value": false }	{ "id": 360022282754, "value": "" }	{ "id": 80814087, "value": "NC" }

so we can use flatten to access those objects, which makes "more rows"

select
    d.json:id::number as id
    ,d.json:brand_id::number as brand_id
    ,try_to_timestamp(d.json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
    ,f.*
from data as d
    ,table(flatten(input=>json:custom_fields)) f

ID	BRAND_ID	CREATED_AT	SEQ	PATH	INDEX	VALUE	THIS
151693	36000	2022-05-23 19:26:35.000	1	[0]	0	{ "id": 57866008, "value": false }	[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
151693	36000	2022-05-23 19:26:35.000	1	[1]	1	{ "id": 360022282754, "value": "" }	[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
151693	36000	2022-05-23 19:26:35.000	1	[2]	2	{ "id": 80814087, "value": "NC" }	[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]

So we can pull out know values (a manual PIVOT)

select
    d.json:id::number as id
    ,d.json:brand_id::number as brand_id
    ,try_to_timestamp(d.json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
    ,max(iff(f.value:id=80814087, f.value:value::text, null)) as v80814087
    ,max(iff(f.value:id=360022282754, f.value:value::text, null)) as v360022282754
    ,max(iff(f.value:id=57866008, f.value:value::text, null)) as v57866008
from data as d
    ,table(flatten(input=>json:custom_fields)) f
group by 1,2,3, f.seq

grouping by the f.seq means if you have many "rows" of input these will be kept apart, even if they share common values for 1,2,3

gives:

ID	BRAND_ID	CREATED_AT	V80814087	V360022282754	V57866008
151693	36000	2022-05-23 19:26:35.000	NC	<empty string>	false

Now if you do not know the names of the values, there is no way short of dynamic SQL and double parsing to turns rows into columns.