Snowflake latteral flatten data types-CodePudding

I have a table containing an id column and a json column(variant data type). I want to flatten the data, make the value column a variant, assign each value in the value column a data type if a condition is met, then eventually pivot the data and have each column be the correct data type.

Example code that doesn't work:

with cte as (
    select 
        1 as id,
        parse_json('{
            "field1":"TRUE",
            "field2":"some string",
            "field3":"1.035",
            "field4":"097334"
        }') as my_output
)

select 
    id,
    key,
    to_variant(
        case
            when value in ('true', 'false') then value::boolean
            when value like ('1.0') then value::decimal
            else value::string
        end) as value
from cte, lateral flatten(my_output)

Ultimately, I'd like to pivot the data and have a wide table with columns id, field1, field2, etc. where field1 is boolean, field2 is string, field3 is a decimal etc.

This is just a simple example, instead of 4 fields, I'm dealing with hundreds. Is this possible?

For the pivot, I'm using dbt_utils.get_column_values to get the column names dynamically. I'd really prefer a solution that doesn't involve listing out the column names, especially since there are hundreds.

CodePudding user response：

Since you'd have to define each column in your PIVOT statement, anyway, it'd probably be much easier to simply select each attribute directly and cast to the correct data type, rather than using a lateral flatten.

select
    my_output.field1::boolean,
    my_output.field2::string,
    my_output.field3::decimal(5,3),
    my_output.field4::string
from cte;

Alternatively, if you want this to be dynamically created, you could create a stored procedure that dynamically uses your json to create a view over your table that has this select in it.

CodePudding user response：

Solution ended up being

select
    id,
    key,
    ifnull(try_parse_json(value), value) as value_mod,
    typeof(value_mod)
from cte, lateral flatten(my_output)

Leading zeros are removed so things like zip codes have to be accounted for.