I am trying to convert one dataset which declares a column to have a certain struct
type (eg. struct<x: string, y: string>
) to a map<string, string>
type. And I would like to do it in SQL, possibly without using UDFs.
Example input data:
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY') union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select *
from input
expected output is a column of type map<string, string>
struct_col:map<string, string> |
---|
{"x":"valX","y":"valY"} |
{"x":"valX1","y":"valY2"} |
UPDATE:
So far I managed to find this very convoluted solution which works only with Spark >= 3.1.0
(because of json_object_keys
function). Would be so nice to just be able to cast a struct to a map
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY') union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select transform_values(
map_from_arrays(
json_object_keys(to_json(struct_col)),
json_object_keys(to_json(struct_col))
),
(k, v) -> get_json_object(to_json(struct_col), '$.' || k))
from input
CodePudding user response:
I have found a way to do it which requires one roundtrip of serializing and parsing a json using the to_json
and from_json
functions. The trick is that the from_json
also takes a schema argument where I use the map<string, string>
type.
Also, this solution should work with spark < 3.x
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY')
union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select from_json(to_json(struct_col), 'map<string, string>') as map_col
from input;
CodePudding user response:
What about
create_map('struct_col.x', 'struct_col.valX', 'struct_col.y', 'struct_col.valY')