have a 'dateinfo' column that I extract from a table in Athena, which is a json like the ones you can see below.
[{"pickuprequesteddate":"2022-08-09T00:00:00"}, {"deliveryrequesteddate":"2022-08-09T00:00:00"}]
[{"departureestimateddate":"2022-08-25T00:00:00"}, {"arrivalestimateddate":"2022-10-07T00:00:00"}, {}, {}]
As you can see inside the json there are different keys. I am interested in extracting the values for 'pickuprequesteddate' and 'deliveryrequesteddate' if they are in the json array. That is, for the examples above I would like to obtain as a result a column with the following values:
[2022-08-09T00:00:00,deliveryrequesteddate":"2022-08-09T00:00:00]
[null, null, null, null]
I know how to extract the values of each key but separately, using
TRANSFORM(CAST(stopinfo AS ARRAY<JSON>), x -> JSON_EXTRACT_SCALAR(x, '$.dateinfo.pickuprequesteddate')) as pickup,
TRANSFORM(CAST(stopinfo AS ARRAY<JSON>), x -> JSON_EXTRACT_SCALAR(x, '$.dateinfo.deliveryrequesteddate')) as delivery
However, this gives me two separate columns.
How could I extract the values the way I want?
CodePudding user response:
If only one is present in the object you can use coalesce
:
WITH dataset (stopinfo) AS (
VALUES (JSON '[{"pickuprequesteddate":"2022-08-09T00:00:00"}, {"deliveryrequesteddate":"2022-08-09T00:00:00"}]'),
(JSON '[{"departureestimateddate":"2022-08-25T00:00:00"}, {"arrivalestimateddate":"2022-10-07T00:00:00"}, {}, {}]')
)
-- query
select TRANSFORM(
CAST(stopinfo AS ARRAY(JSON)),
x-> coalesce(
JSON_EXTRACT_SCALAR(x, '$.pickuprequesteddate'),
JSON_EXTRACT_SCALAR(x, '$.deliveryrequesteddate')
)
)
from dataset;
Output:
_col0 |
---|
[2022-08-09T00:00:00, 2022-08-09T00:00:00] |
[NULL, NULL, NULL, NULL] |