Home > Software design >  splitting a dict-like varchar column into multiple columns using SQL presto
splitting a dict-like varchar column into multiple columns using SQL presto

Time:03-08

I have a column in a table that is varchar but has a dictionary-like format. Some rows have more key-value pairs (for example first row has 3 pairs and second row has 4 pairs). For example:

column
{"customerid":"12345","name":"John", "likes":"Football, Running"}
{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"}

I need a query that can "explode" the column like so:

customerid name likes dislikes
12345 John Football, Running
54321 Sam Art Hiking

No extra rows are added. Just extra columns (There are other already existing columns in the table).

I've tried casting the varchar column to an array and then using UNNEST function but it doesn't work. I think that method creates extra rows.

I am using Prestosql.

CodePudding user response:

Your data looks like json, so you can parse and process it:

-- sample data
WITH dataset (column) AS (
    VALUES ('{"customerid":"12345","name":"John", "likes":"Football, Running"}' ),
        ('{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"}')
) 

--query
select json_extract_scalar(json_parse(column), '$.customerid') customerid,
    json_extract_scalar(json_parse(column), '$.name') name,
    json_extract_scalar(json_parse(column), '$.likes') likes,
    json_extract_scalar(json_parse(column), '$.dislikes') dislikes
from dataset

Output:

customerid name likes dislikes
12345 John Football, Running
54321 Sam Art Hiking

In case of many columns you can prettify it by casting to parsed json to map (depended on contents it can be map(varchar, varchar) or map(varchar, json)):

--query
select m['customerid'] customerid,
    m['name'] name,
    m['likes'] likes,
    m['dislikes'] dislikes
from (
    select cast(json_parse(column) as map(varchar, varchar)) m
    from dataset
)
  • Related