Home > Enterprise >  sql json extract scalar with different keys in json
sql json extract scalar with different keys in json

Time:08-11

have a 'dateinfo' column that I extract from a table in Athena, which is a json like the ones you can see below.

[{"pickuprequesteddate":"2022-08-09T00:00:00"}, {"deliveryrequesteddate":"2022-08-09T00:00:00"}]

[{"departureestimateddate":"2022-08-25T00:00:00"}, {"arrivalestimateddate":"2022-10-07T00:00:00"}, {}, {}]

As you can see inside the json there are different keys. I am interested in extracting the values for 'pickuprequesteddate' and 'deliveryrequesteddate' if they are in the json array. That is, for the examples above I would like to obtain as a result a column with the following values:

[2022-08-09T00:00:00,deliveryrequesteddate":"2022-08-09T00:00:00]
[null, null, null, null]

I know how to extract the values of each key but separately, using

TRANSFORM(CAST(stopinfo AS ARRAY<JSON>), x -> JSON_EXTRACT_SCALAR(x, '$.dateinfo.pickuprequesteddate')) as pickup,
    TRANSFORM(CAST(stopinfo AS ARRAY<JSON>), x -> JSON_EXTRACT_SCALAR(x, '$.dateinfo.deliveryrequesteddate')) as delivery

However, this gives me two separate columns.

How could I extract the values the way I want?

CodePudding user response:

If only one is present in the object you can use coalesce:

WITH dataset (stopinfo) AS (
   VALUES (JSON '[{"pickuprequesteddate":"2022-08-09T00:00:00"}, {"deliveryrequesteddate":"2022-08-09T00:00:00"}]'),
   (JSON '[{"departureestimateddate":"2022-08-25T00:00:00"}, {"arrivalestimateddate":"2022-10-07T00:00:00"}, {}, {}]')
)

-- query
select TRANSFORM(
        CAST(stopinfo AS ARRAY(JSON)),
        x-> coalesce(
            JSON_EXTRACT_SCALAR(x, '$.pickuprequesteddate'),
            JSON_EXTRACT_SCALAR(x, '$.deliveryrequesteddate')
        )
    )
from dataset;

Output:

_col0
[2022-08-09T00:00:00, 2022-08-09T00:00:00]
[NULL, NULL, NULL, NULL]
  • Related