Home > Blockchain >  Handling different json formats in Presto when extracting as string?
Handling different json formats in Presto when extracting as string?

Time:09-21

I am trying to extract a string from a JSON in presto. Here are the 2 options for how the extracted data can look before I cast to a string:

  1. JSON_EXTRACT(my_column, '$.my_column_extract') = ["A::1","B::2","C::3","D::4"]
  2. JSON_EXTRACT(my_column, '$.my_column_extract') = {"0":"A::1","1":"B::2","2":"C::3","3":"D::4,5"}

For both above options, the goal string format I want to achieve: For 1) A::1, B::2, C::3, D::4 For 2) A::1, B::2, C::3, D::4,5

So I cast as ARRAY(VARCHAR) and then ARRAY_JOIN to get a string. This works for 1) but not for 2).

  1. ARRAY_JOIN( CAST( JSON_EXTRACT(my_column, '$.my_column_extract') AS ARRAY(VARCHAR) ), ', ' ) = A::1, B::2, C::3, D::4
  2. The same functions result in an error due the format. Error notice (INVALID_CAST_ARGUMENT) Cannot cast to array(varchar). Expected a json array, but got {"0":"A::1","1":"B::2","2":"C::3","3":"D::4,5"}

How can I get 2) to result in the goal string format?

CodePudding user response:

If your records are available only in those two formats then you can leverage try_cast function which returns null if the cast can't be performed and use coalesce two cast to the second format:

-- sample data
WITH dataset(my_column) AS (
 values ('{"my_column_extract":["A::1","B::2","C::3","D::4"]}'),
        ('{"my_column_extract":{"0":"A::1","1":"B::2","2":"C::3","3":"D::4,5"}}')
),

-- query parts
extracted as (
    select JSON_EXTRACT(my_column, '$.my_column_extract') extr
    from dataset)

SELECT coalesce(
        try_cast(extr as array(varchar)), -- try process array
        map_values(cast(extr as map(varchar, varchar)))) -- if not array - process json object as map and get map values
FROM extracted;

Output:

_col0
[A::1, B::2, C::3, D::4]
[A::1, B::2, C::3, D::4,5]

And then process resulting arrays as needed

  • Related