Home > OS >  Athena/Presto: unnesting list of objects from serialized JSON string
Athena/Presto: unnesting list of objects from serialized JSON string

Time:11-30

My Athena/Presto database has a serialized JSON object, which I am having a difficulty to handle.
Here is an example of the data:

WITH input AS (
    SELECT JSON_PARSE('"[{\"is_flag\": 0, \"url\": \"https://app.com/123\"},{\"is_flag\": 1, \"url\": \"https://app.com/456\"}]"') AS json_obj
)

SELECT typeof(x),
    x

FROM
    (input) AS t(x)
    CROSS JOIN UNNEST (CAST(x AS ARRAY<JSON>)) AS u(y)
    

the error message says:

INVALID_CAST_ARGUMENT: Cannot cast to array(json). Expected a json array, but got [{"is_flag": 0, "url": "https://app.com/123"},{"is_flag": 1, "url": "https://app.com/456"}] "[{"is_flag": 0, "url": "https://app.com/123"},{"is_flag": 1, "url": "https://app.com/456"}]"

While what I eventually interested in is a table as below:

is_flag url
0 https://app.com/123
1 https://app.com/456

CodePudding user response:

Your json is actually double encoded one (i.e. it is a json string containing encoded json), so you need to double decode it with second json_parse (note that I use succinct syntax for unnest):

WITH input AS (
    SELECT JSON_PARSE('"[{\"is_flag\": 0, \"url\": \"https://app.com/123\"},{\"is_flag\": 1, \"url\": \"https://app.com/456\"}]"') AS json_obj
)

-- query
SELECT json_extract_scalar(obj, '$.is_flag') is_flag,
       json_extract_scalar(obj, '$.url') url
FROM input,
unnest(cast(json_parse(cast(json_obj as varchar)) as array(json))) as t(obj);

Output:

is_flag url
0 https://app.com/123|
1 https://app.com/456|
  • Related