Home > Enterprise >  AWS athena deep scan for JSON Data
AWS athena deep scan for JSON Data

Time:11-12

I am not able to query the nested keys with .. operator which is supported by JSONPath in Athena.

WITH dataset AS (
   SELECT '{"name": "aa",
             "id":-100,
             "org": "engineering",
             "key":{"val":1000},
             "projects": [{"name":"project1", "completed":false},{"name":"project2", "completed":true}]}'
     AS blob1
)
-- This is not working
SELECT * FROM dataset where json_extract_scalar(blob1, '$..val') = '1000';

-- Working
SELECT * FROM dataset where json_extract_scalar(blob1, '$.key.val') = '1000';

Is there any alternative way by which I can query nested fields in Athena?

CodePudding user response:

A path is indefinite if it contains .. - a deep scan operator. Indefinite paths always returns a list(as represented by current JsonProvider).

Do not use the json_extract_scalar function on arrays, maps, or structs.

To obtain the first element in the array, use the json_array_get function along with json_extract and specify the index position.

SELECT * FROM dataset where json_array_get(json_extract(blob1, '$..val'), 0) = '1000';

CodePudding user response:

Athena/Presto json path support is not very great. You can work around that but it is quite messy involving casting your json to MAP(VARCHAR, JSON) and then working with resulting values:

WITH dataset AS (
    SELECT '{"name": "aa",
             "id":-100,
             "org": "engineering",
             "key":{"val":1000},
             "projects": [{"name":"project1", "completed":false},{"name":"project2", "completed":true}]}' AS blob1
)

SELECT *
FROM dataset
WHERE contains(
        transform(
            map_values(cast(json_parse(blob1) as MAP(VARCHAR, JSON))),
            j->try(json_extract_scalar(j, '$.val'))
        ),
        '1000'
    )
  • Related