Home > database >  How to unnest my JSON data object and create a simplified JSON using Snowflake?
How to unnest my JSON data object and create a simplified JSON using Snowflake?

Time:04-12

My current JSON object looks like this:

-- create a sample table
create or replace table json_example(v variant);
-- create sample json record
insert into json_example
select parse_json(
    '[
      {
        "key": "variable_a",
        "value": {
          "double_value": null,
          "float_value": null,
          "int_value": null,
          "string_value": "https://example.com"
        }
      },
      {
        "key": "variable_b",
        "value": {
          "double_value": null,
          "float_value": null,
          "int_value": 2,
          "string_value": null
        }
      }
     ]');

And this is the simplified JSON that I am trying to achieve:

  {
    "variable_a": "https://example.com",
    "variable_b": 2
  }

How can I get the simplified JSON from the multilevel JSON object?

This is how I started to think:

select value:key::string as key, value:value:string_value::varchar as value
from json_example, lateral flatten(input => v)
union all
select value:key::string as key, value:value:int_value::varchar as value
from json_example, lateral flatten(input => v)

Thank you in advance.

CodePudding user response:

There are three parts to this:

  1. flatten the original JSON array and select the values you want from it
  2. create new JSON objects based on the resulting row values.
  3. combine the JSON objects into a single object.

Steps 1 and 2 are relatively straightforward:

with json_example(json) as (
  select parse_json(
      '[
        {
          "key": "variable_a",
          "value": {
            "double_value": null,
            "float_value": null,
            "int_value": null,
            "string_value": "https://example.com"
          }
        },
        {
          "key": "variable_b",
          "value": {
            "double_value": null,
            "float_value": null,
            "int_value": 2,
            "string_value": null
          }
        }
       ]'
  )
),
flattened_rows as
(
      select v.value:key::string as key, 
        v.value:value:int_value::int as int_value,
        v.value:value:string_value::string as string_value
        -- other values here
    from json_example, lateral flatten(input => json) as v
)
,
simplified_json as 
(
  select
    case when int_value is not null then object_construct(key, int_value)::variant
    -- add cases for other values here
    else object_construct(key, string_value)
    end as json
  from flattened_rows
)
select *
from simplified_json
;

flattened_rows looks like this:

KEY INT_VALUE STRING_VALUE
variable_a https://example.com
variable_b 2

simplified_json looks like this:

JSON
{ "variable_a": "https://example.com" }
{ "variable_b": 2 }

The last part is a little more tricky, since we have to combine all those rows into a single row. For that, we can use a UDTF.

create or replace function merge_object_rows(json variant) 
returns table(obj variant)
language javascript 
as '
{
    initialize: function (argumentInfo, context) {
        this.obj = {};
    },
    processRow: function (row, rowWriter, context) {
        Object.assign(this.obj, row.JSON);
    },
    finalize: function (rowWriter) {
        rowWriter.writeRow({OBJ: this.obj});
    }
}
';

The code above uses Object.assign for each row, to copy all the property values into a single object, and then it writes aggregate row data when the partition is complete.

Now, we can change our final SELECT to call this table function:

select merged.*
from simplified_json, table(merge_object_rows(json) over (partition by 1)) as merged

Results:

OBJ
{ { "variable_a": "https://example.com", "variable_b": 2 }

CodePudding user response:

So if you want everthing thing to be JSON text you can:

WITH data as (
select parse_json(
    '[
      {
        "key": "variable_a",
        "value": {
          "double_value": null,
          "float_value": null,
          "int_value": null,
          "string_value": "https://example.com"
        }
      },
      {
        "key": "variable_b",
        "value": {
          "double_value": null,
          "float_value": null,
          "int_value": 2,
          "string_value": null
        }
      }
     ]') as json
)
SELECT f.value:key::text as t_key
    ,try_to_double(f.value:value:double_value::text) as d_val
    ,try_to_double(f.value:value:float_value::text) as f_val
    ,try_to_number(f.value:value:int_value::text) as n_val
    ,f.value:value:string_value::text as s_val
    ,coalesce(d_val::text, f_val::text, n_val::text, s_val) as c_val
    ,object_construct(t_key, c_val) as obj
FROM DATA, lateral flatten(input=>json) f
T_KEY D_VAL F_VAL N_VAL S_VAL C_VAL OBJ
variable_a https://example.com https://example.com
variable_b 2 2

Which then shows us how to build a CASE statement, and build clean native objects like:

SELECT 
    case 
        when not is_null_value(f.value:value:double_value) 
            then object_construct(f.value:key::text, try_to_double(f.value:value:double_value::text))
        when not is_null_value(f.value:value:float_value) 
            then object_construct(f.value:key::text, try_to_double(f.value:value:float_value::text))
        when not is_null_value(f.value:value:int_value) 
            then object_construct(f.value:key::text, try_to_number(f.value:value:int_value::text))
        else 
            object_construct(f.value:key::text, f.value:value:string_value::text)
    end obj
FROM DATA, lateral flatten(input=>json) f
OBJ
{ "variable_a": "https://example.com" }
{ "variable_b": 2 }

Which can be turned into a single object like so:

SELECT 
    object_agg(f.value:key, 
    case 
        when not is_null_value(f.value:value:double_value) 
            then try_to_double(f.value:value:double_value::text)
        when not is_null_value(f.value:value:float_value) 
            then try_to_double(f.value:value:float_value::text)
        when not is_null_value(f.value:value:int_value) 
            then try_to_number(f.value:value:int_value::text)
        else 
            f.value:value:string_value
    end
   ) as obj
FROM DATA, lateral flatten(input=>json) f
OBJ
{ "variable_a": "https://example.com", "variable_b": 2 }
  • Related