Home > Software design >  Convert struct to map in Spark SQL
Convert struct to map in Spark SQL

Time:12-19

I am trying to convert one dataset which declares a column to have a certain struct type (eg. struct<x: string, y: string>) to a map<string, string> type. And I would like to do it in SQL, possibly without using UDFs.

Example input data:

WITH input (struct_col) as (
    select named_struct('x', 'valX', 'y', 'valY') union all
    select named_struct('x', 'valX1', 'y', 'valY2')
)
select *
from input

expected output is a column of type map<string, string>

struct_col:map<string, string>
{"x":"valX","y":"valY"}
{"x":"valX1","y":"valY2"}

UPDATE:

So far I managed to find this very convoluted solution which works only with Spark >= 3.1.0 (because of json_object_keys function). Would be so nice to just be able to cast a struct to a map

WITH input (struct_col) as (
    select named_struct('x', 'valX', 'y', 'valY') union all
    select named_struct('x', 'valX1', 'y', 'valY2')
)
select transform_values(
       map_from_arrays(
               json_object_keys(to_json(struct_col)),
               json_object_keys(to_json(struct_col))
           ),
        (k, v) -> get_json_object(to_json(struct_col), '$.' || k))
from input

CodePudding user response:

I have found a way to do it which requires one roundtrip of serializing and parsing a json using the to_json and from_json functions. The trick is that the from_json also takes a schema argument where I use the map<string, string> type.

Also, this solution should work with spark < 3.x

WITH input (struct_col) as (
    select named_struct('x', 'valX', 'y', 'valY')
    union all
    select named_struct('x', 'valX1', 'y', 'valY2')
) 
select from_json(to_json(struct_col), 'map<string, string>') as map_col
from input;

CodePudding user response:

What about

create_map('struct_col.x', 'struct_col.valX', 'struct_col.y', 'struct_col.valY')
  • Related