My Athena/Presto database has a serialized JSON object, which I am having a difficulty to handle.
Here is an example of the data:
WITH input AS (
SELECT JSON_PARSE('"[{\"is_flag\": 0, \"url\": \"https://app.com/123\"},{\"is_flag\": 1, \"url\": \"https://app.com/456\"}]"') AS json_obj
)
SELECT typeof(x),
x
FROM
(input) AS t(x)
CROSS JOIN UNNEST (CAST(x AS ARRAY<JSON>)) AS u(y)
the error message says:
INVALID_CAST_ARGUMENT: Cannot cast to array(json). Expected a json array, but got [{"is_flag": 0, "url": "https://app.com/123"},{"is_flag": 1, "url": "https://app.com/456"}] "[{"is_flag": 0, "url": "https://app.com/123"},{"is_flag": 1, "url": "https://app.com/456"}]"
While what I eventually interested in is a table as below:
is_flag | url |
---|---|
0 | https://app.com/123 |
1 | https://app.com/456 |
CodePudding user response:
Your json is actually double encoded one (i.e. it is a json string containing encoded json), so you need to double decode it with second json_parse
(note that I use succinct syntax for unnest
):
WITH input AS (
SELECT JSON_PARSE('"[{\"is_flag\": 0, \"url\": \"https://app.com/123\"},{\"is_flag\": 1, \"url\": \"https://app.com/456\"}]"') AS json_obj
)
-- query
SELECT json_extract_scalar(obj, '$.is_flag') is_flag,
json_extract_scalar(obj, '$.url') url
FROM input,
unnest(cast(json_parse(cast(json_obj as varchar)) as array(json))) as t(obj);
Output:
is_flag | url |
---|---|
0 | https://app.com/123| |
1 | https://app.com/456| |