I have a table structure in snowflake with variant data type as shown below, you can see the a single ID is having multiple variant objects.
----- --------------------------
| ID | STATE_INFO |
|----- --------------------------|
| IND | { |
| | "population": "1000k", |
| | "state": "KA" |
| | } |
| IND | { |
| | "population": "2000k", |
| | "state": "AP" |
| | } |
| IND | { |
| | "population": "3000K", |
| | "state": "TN" |
| | } |
| US | { |
| | "population": "100k", |
| | "state": "Texas" |
| | } |
| US | { |
| | "population": "200k", |
| | "state": "Florida" |
| | } |
| US | { |
| | "population": "300K", |
| | "state": "Iowa" |
| | } |
----- --------------------------
I want to combine these variant objects into a single object like below by merging the rows into one array or dictionary object
----- ---------------------------
| ID | STATE_INFO |
|----- ---------------------------|
| IND | [{ |
| | "population": "1000k", |
| | "state": "KA" |
| | }, |
| | { |
| | "population": "2000k", |
| | "state": "AP" |
| | }, |
| | { |
| | "population": "3000K", |
| | "state": "TN" |
| | }] |
| US | [{ |
| | "population": "100k", |
| | "state": "Texas" |
| | }, |
| | { |
| | "population": "200k", |
| | "state": "Florida" |
| | }, |
| | { |
| | "population": "300K", |
| | "state": "Iowa" |
| | }] |
----- ---------------------------
Like in SQL terminologies, we can say like below SQL statement
Select id,merge(STATE_INFO) from table group by id;
CodePudding user response:
Like Mike said ARRAY_AGG
function is what you need and it works on a variant column
select id, array_agg(STATE_INFO) within group (order by id) STATE_INFO
from table
group by 1
order by 1
CodePudding user response:
Using this CTE for data:
With data(id, state_info) as (
select column1, parse_json(column2)
from values
('IND', '{ "population": "1000k", "state": "KA" }'),
('IND', '{ "population": "2000k", "state": "AP" }'),
('IND', '{ "population": "3000K", "state": "TN" }'),
('US', '{ "population": "100k", "state": "Texas" }'),
('US', '{ "population": "200k", "state": "Florida" }'),
('US', '{ "population": "300K", "state": "Iowa" }')
)
This code is is almost exactly the same is demircioglu's answer, but has no ordering of the array content.
select id, array_agg(state_info) as stateinfo
from data
group by 1;
which because of the order of the input still appears ordered. But it is really random, it depends if you need the data ordered or not:
ID | STATEINFO |
---|---|
US | [ { "population": "100k", "state": "Texas" }, { "population": "200k", "state": "Florida" }, { "population": "300K", "state": "Iowa" } ] |
IND | [ { "population": "1000k", "state": "KA" }, { "population": "2000k", "state": "AP" }, { "population": "3000K", "state": "TN" } ] |