Home > database >  Presto: avoid null entries in maps and arrays
Presto: avoid null entries in maps and arrays

Time:07-10

I am new to presto and I have a table with a json column:

json-column
     
{"key-array":"[\"AAA\",\"BBB\",\"CCC\"]", "value-array":"[\"123\",\"456\",\"789\"]","str_1":"abc"}

I am trying to run the following query on the json-column.

select key, 
    value
from ( 
    select cast(json_parse(json_payload) as map(varchar, json)) parsed 
    from my_table
) a, 
unnest(
        cast(json_parse(cast(parsed['key-arr'] as varchar)) as array(varchar)), 
        cast(json_parse(cast(parsed['val-arr'] as varchar)) as array(varchar))
    ) as t(key, value)

Now it is possible that the map entries might be null. So the query above fails whenever there are NULL entries.

An example of the data where the query fails is:

{"key-array":"", "value-array":"","str_1":"abc"}

This is the error I see:

presto error: Cannot cast to array(varchar). Expected a json array, but got null

Upon searching for possible solutions, I came across two functions filter and coalesce.

So I tried the following:

select key, 
    value
from ( 
    select cast(json_parse(json_payload) as map(varchar, json)) parsed 
    from my_table
) a, 
unnest(
        cast(coalesce(json_parse(cast(parsed['key-arr'] as varchar)), json_parse(cast('["hello"]' as varchar))) as array(varchar)), 
        cast(coalesce(json_parse(cast(parsed['value-arr'] as varchar)), json_parse(cast('["hello1"]' as varchar))) as array(varchar))
    ) as t(key, value)

But I now get the following error:

presto error: Cannot convert value to JSON: ''

I am not sure what I am doing wrong here. Can someone kindly help me out here?

CodePudding user response:

Whoever double encoded the array should be punished a bit =). In this example the issue is that one property is double encoded and another one - is not, so for the second one json_parse fails cause it can't parse empty string (i.e. select json_parse('') will fail). If we don't want to dive into regexes (and I would say - we don't) you can use try (will return null if expression results in subset of errors) coalesce to wrap the json property processing for unnest, though it can potentially swallow some records where you have data you want to process:

-- sample data
WITH dataset(json_payload) AS (
    VALUES 
        ('{"key-array":"[\"AAA\",\"BBB\",\"CCC\"]", "value-array":"[\"123\",\"456\",\"789\"]"}'),
        ('{"key-array":"", "value-array":"","str_1":"abc"}')
) 

-- query
select key, 
    value
from ( 
    select cast(json_parse(json_payload) as map(varchar, json)) parsed 
    from dataset
) a, 
unnest(
        coalesce(try(cast(json_parse(cast(parsed['key-array'] as varchar)) as array(varchar))), array[]), 
        coalesce(try(cast(json_parse(cast(parsed['value-array'] as varchar)) as array(varchar))), array[])
    ) as t(key, value)

Output:

key value
AAA 123
BBB 456
CCC 789
  • Related