Home > Enterprise >  How can I extract a json column into new columns automatically in Snowflake SQL?
How can I extract a json column into new columns automatically in Snowflake SQL?

Time:10-30

This is as example taken from another thread, but essentially I would like to achieve this:

Sample data

ID  Name  Value
1   TV1   {"URL": "www.url.com", "Icon": "some_icon"}
2   TV2   {"URL": "www.url.com", "Icon": "some_icon", "Facebook": "Facebook_URL"}
3   TV3   {"URL": "www.url.com", "Icon": "some_icon", "Twitter": "Twitter_URL"}
..........

Expected output

ID  Name  URL          Icon           Facebook      Twitter
1   TV1   www.url.com  some_icon          NULL         NULL
2   TV2   www.url.com  some_icon  Facebook_URL         NULL
3   TV3   www.url.com  some_icon          NULL  Twitter_URL

I'm totally new to Snowflake so I'm shaking my head on how to do this easily (and hopefully automatically, in the case where some rows might have more elements in the json than other rows, which would be tedious to assign manually). Some lines might have sub-categories too.

I found the parse_json function for Snowflake, but it's only giving me the same json column in a new column, still in json format.

TIA!

CodePudding user response:

You can create a view over your table with the following SELECT:

SELECT ID, 
       Name,
       Value:URL::varchar as URL,
       Value:Icon::varchar as Icon,
       Value:Facebook::varchar as Facebook,
       Value:Twitter::varchar as Twitter
FROM tablename;

Additional attributes will be ignored unless you add them to the view. There is no way to "automatically" include them into the view, but you could create a stored procedure that dynamically generates the view based on all the attributes that are in the full variant content of a table.

CodePudding user response:

You can create a SP to automatically build the CREATE VIEW for you based on the JSON data in the VARIANT.

I have some simple example below:

-- prepare the table and data
create or replace table test (
  col1 int, col2 string, 
  data1 variant, data2 variant
);

insert into test select 1,2, parse_json(
   '{"URL": "test", "Icon": "test1", "Facebook": "http://www.facebook.com"}'
), parse_json(
   '{"k1": "test", "k2": "test1", "k3": "http://www.facebook.com"}'
);

insert into test select 3,4,parse_json(
   '{"URL": "test", "Icon": "test1", "Twitter": "http://www.twitter.com"}'
), parse_json(
   '{"k4": "v4", "k3": "http://www.ericlin.me"}'
);

-- create the SP, we need to know which table and 
-- column has the variant data
create or replace procedure create_view(
    table_name varchar
)
returns string
language javascript
as
$$
  var final_columns = [];
  
  // first, find out the columns
  var query = `SHOW COLUMNS IN TABLE ${TABLE_NAME}`;
  var stmt = snowflake.createStatement({sqlText: query});
  var result = stmt.execute();
  
  var variant_columns = [];
  
  while (result.next()) {
    var col_name = result.getColumnValue(3);
    var data_type = JSON.parse(result.getColumnValue(4));

    // just use it if it is not a VARIANT type
    // if it is variant type, we need to remember this column
    // and then run query against it later
    if (data_type["type"] != "VARIANT") {
      final_columns.push(col_name);
    } else {
      variant_columns.push(col_name);
    }
  }

  var columns = {};
  query = `SELECT `   variant_columns.join(', ')   ` FROM ${TABLE_NAME}`;
  stmt = snowflake.createStatement({sqlText: query});
  result = stmt.execute();

  while (result.next()) {
      for(i=1; i<=variant_columns.length; i  ) {
        var sub_result = result.getColumnValue(i);
        if(!sub_result) {
          continue;
        }

        var keys = Object.keys(sub_result);

        for(j=0; j<keys.length; j  ) {
          columns[variant_columns[i-1]   ":"   keys[j]] = keys[j];
        }
      }
  }

  for(path in columns) {
    final_columns.push(path   "::STRING AS "   columns[path]);
  }

  var create_view_sql = "CREATE OR REPLACE VIEW "   
    TABLE_NAME   "_VIEW\n"     
    "AS SELECT "   "\n"  
    "  "   final_columns.join(",\n  ")   "\n"  
    "FROM "   TABLE_NAME   ";";
    
  snowflake.execute({sqlText: create_view_sql});
  return create_view_sql   "\n\nVIEW created successfully.";
$$;

Execute the SP will return below string:

call create_view('TEST');
 --------------------------------------- 
| CREATE_VIEW                           |
|---------------------------------------|
| CREATE OR REPLACE VIEW TEST_VIEW      |
| AS SELECT                             |
|   COL1,                               |
|   COL2,                               |
|   DATA1:Facebook::STRING AS Facebook, |
|   DATA1:Icon::STRING AS Icon,         |
|   DATA1:URL::STRING AS URL,           |
|   DATA2:k1::STRING AS k1,             |
|   DATA2:k2::STRING AS k2,             |
|   DATA2:k3::STRING AS k3,             |
|   DATA1:Twitter::STRING AS Twitter,   |
|   DATA2:k4::STRING AS k4              |
| FROM TEST;                            |
|                                       |
| VIEW created successfully.            |
 --------------------------------------- 

Then query the VIEW:

SELECT * FROM TEST_VIEW;
 ------ ------ ------------------------- ------- ------ ------ ------- ------------------------- ------------------------ ------ 
| COL1 | COL2 | FACEBOOK                | ICON  | URL  | K1   | K2    | K3                      | TWITTER                | K4   |
|------ ------ ------------------------- ------- ------ ------ ------- ------------------------- ------------------------ ------|
|    1 | 2    | http://www.facebook.com | test1 | test | test | test1 | http://www.facebook.com | NULL                   | NULL |
|    3 | 4    | NULL                    | test1 | test | NULL | NULL  | http://www.ericlin.me   | http://www.twitter.com | v4   |
 ------ ------ ------------------------- ------- ------ ------ ------- ------------------------- ------------------------ ------ 

Query the source table:

SELECT * FROM TEST;
 ------ ------ ------------------------------------------ ----------------------------------- 
| COL1 | COL2 | DATA1                                    | DATA2                             |
|------ ------ ------------------------------------------ -----------------------------------|
|    1 | 2    | {                                        | {                                 |
|      |      |   "Facebook": "http://www.facebook.com", |   "k1": "test",                   |
|      |      |   "Icon": "test1",                       |   "k2": "test1",                  |
|      |      |   "URL": "test"                          |   "k3": "http://www.facebook.com" |
|      |      | }                                        | }                                 |
|    3 | 4    | {                                        | {                                 |
|      |      |   "Icon": "test1",                       |   "k3": "http://www.ericlin.me",  |
|      |      |   "Twitter": "http://www.twitter.com",   |   "k4": "v4"                      |
|      |      |   "URL": "test"                          | }                                 |
|      |      | }                                        |                                   |
 ------ ------ ------------------------------------------ ----------------------------------- 

You can refine this SP to detect nested data and have them added to the columns list as well.

  • Related