I have some Redshift SQL that I'm trying to convert to snowflake SQL to extract values from a json field. The issue I'm running into is the specification of the index required.
Because I run A/B/n tests, there can be multiple indexes up to 'n'.
So I had this piece of SQL working for Redshift:
SELECT JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (e.splits,n.n),'split_type') types
, JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (e.splits,n.n),'weight') as weight
FROM experiments e, (SELECT (p0.n p1.n*2 p2.n * POWER(2,2) p3.n * POWER(2,3) p4.n * POWER(2,4) p5.n * POWER(2,5)
p6.n * POWER(2,6) p7.n * POWER(2,7) p8.n * POWER(2,8) p9.n * POWER(2,9))::int as n
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7,
(SELECT 0 as n UNION SELECT 1) p8,
(SELECT 0 as n UNION SELECT 1) p9
Order by 1
) n
WHERE types <> ''
AND weight <> ''
From reading snowlfake's documentation, it would seem like the following should work:
SELECT parse_json(parse_json(e.splits)[n.n]):split_type as types,
parse_json(parse_json(e.splits)[n.n]):weight as weight
FROM experiments e, (SELECT (p0.n ...
However I get the error "SQL compilation error: error line 1 at position 39 invalid identifier 'N.N'"
I'm wondering if someone would be able to help with this issue?
EDIT:
experiments table looks like:
exp_ID | splits
1 | [{"id":203,"weight":50,"split_type":"a"},{"id":204,"weight":50,"split_type":"control"}]
2 | [{"id":205,"weight":33.33,"split_type":"a"},{"id":206,"weight":33.33,"split_type":"b"},{"id":207,"weight":33.33,"split_type":"c"}]
3 | [{"id":208,"weight":25,"split_type":"a"},{"id":209,"weight":25,"split_type":"b"},{"id":210,"weight":25,"split_type":"c"},{"id":211,"weight":25,"split_type":"d"}]
required output:
exp_ID | ID | types | weight
1 | 203 | a | 50
1 | 204 | control | 50
2 | 205 | a | 33.33
2 | 206 | b | 33.33
2 | 207 | c | 33.33
3 | 208 | a | 25
3 | 209 | b | 25
3 | 210 | c | 25
3 | 211 | d | 25
CodePudding user response:
With a table defined as
create temp table EXPERIMENTS(EMP_ID int, SPLITS variant);
You can insert rows like this (This is just for testing. Do not use single-row inserts for production pipelines):
insert into experiments select 1, parse_json('[{"id":203,"weight":50,"split_type":"a"},{"id":204,"weight":50,"split_type":"control"}]');
insert into experiments select 2, parse_json('[{"id":205,"weight":33.33,"split_type":"a"},{"id":206,"weight":33.33,"split_type":"b"},{"id":207,"weight":33.33,"split_type":"c"}]');
insert into experiments select 3, parse_json('[{"id":208,"weight":25,"split_type":"a"},{"id":209,"weight":25,"split_type":"b"},{"id":210,"weight":25,"split_type":"c"},{"id":211,"weight":25,"split_type":"d"}]');
With it stored in the table that way, you can query the JSON in columns like this:
select EXP_ID
,VALUE:id as ID
,VALUE:split_type::string as TYPES
,VALUE:weight as WEIGHT
from experiments
,lateral flatten(splits)
CodePudding user response:
The article below is to demonstrate various examples of using LATERAL FLATTEN to extract information from a JSON Document. Examples are provided for its utilization together with GET_PATH, UNPIVOT, and SEQ functions.
https://community.snowflake.com/s/article/Dynamically-extracting-JSON-using-LATERAL-FLATTEN