I have a column in a table that is varchar but has a dictionary-like format. Some rows have more key-value pairs (for example first row has 3 pairs and second row has 4 pairs). For example:
column |
---|
{"customerid":"12345","name":"John", "likes":"Football, Running"} |
{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"} |
I need a query that can "explode" the column like so:
customerid | name | likes | dislikes |
---|---|---|---|
12345 | John | Football, Running | |
54321 | Sam | Art | Hiking |
No extra rows are added. Just extra columns (There are other already existing columns in the table).
I've tried casting the varchar column to an array and then using UNNEST function but it doesn't work. I think that method creates extra rows.
I am using Prestosql.
CodePudding user response:
Your data looks like json, so you can parse and process it:
-- sample data
WITH dataset (column) AS (
VALUES ('{"customerid":"12345","name":"John", "likes":"Football, Running"}' ),
('{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"}')
)
--query
select json_extract_scalar(json_parse(column), '$.customerid') customerid,
json_extract_scalar(json_parse(column), '$.name') name,
json_extract_scalar(json_parse(column), '$.likes') likes,
json_extract_scalar(json_parse(column), '$.dislikes') dislikes
from dataset
Output:
customerid | name | likes | dislikes |
---|---|---|---|
12345 | John | Football, Running | |
54321 | Sam | Art | Hiking |
In case of many columns you can prettify it by casting to parsed json to map (depended on contents it can be map(varchar, varchar)
or map(varchar, json)
):
--query
select m['customerid'] customerid,
m['name'] name,
m['likes'] likes,
m['dislikes'] dislikes
from (
select cast(json_parse(column) as map(varchar, varchar)) m
from dataset
)