Merging Variant rows in Snowflake-CodePudding

I have a table structure in snowflake with variant data type as shown below, you can see the a single ID is having multiple variant objects.

 ----- -------------------------- 
| ID  | STATE_INFO               |
|----- --------------------------|
| IND | {                        |
|     |   "population": "1000k", |
|     |   "state": "KA"          |
|     | }                        |
| IND | {                        |
|     |   "population": "2000k", |
|     |   "state": "AP"          |
|     | }                        |
| IND | {                        |
|     |   "population": "3000K", |
|     |   "state": "TN"          |
|     | }                        |
| US  | {                        |
|     |   "population": "100k",  |
|     |   "state": "Texas"       |
|     | }                        |
| US  | {                        |
|     |   "population": "200k",  |
|     |   "state": "Florida"     |
|     | }                        |
| US  | {                        |
|     |   "population": "300K",  |
|     |   "state": "Iowa"        |
|     | }                        |
 ----- --------------------------

I want to combine these variant objects into a single object like below by merging the rows into one array or dictionary object

 ----- --------------------------- 
| ID  | STATE_INFO                |
|----- ---------------------------|
| IND | [{                        |
|     |   "population": "1000k",  |
|     |   "state": "KA"           |
|     | },                        |
|     | {                         |
|     |   "population": "2000k",  |
|     |   "state": "AP"           |
|     | },                        |
|     | {                         |
|     |   "population": "3000K",  |
|     |   "state": "TN"           |
|     | }]                        |
| US  | [{                        |
|     |   "population": "100k",   |
|     |   "state": "Texas"        |
|     | },                        |
|     | {                         |
|     |   "population": "200k",   |
|     |   "state": "Florida"      |
|     | },                        |
|     | {                         |
|     |   "population": "300K",   |
|     |   "state": "Iowa"         |
|     | }]                        |
 ----- ---------------------------

Like in SQL terminologies, we can say like below SQL statement

Select id,merge(STATE_INFO) from table group by id;

CodePudding user response：

Like Mike said ARRAY_AGG function is what you need and it works on a variant column

select id, array_agg(STATE_INFO) within group (order by id) STATE_INFO
from table
group by 1
order by 1

CodePudding user response：

Using this CTE for data:

With data(id, state_info) as (
    select column1, parse_json(column2)
    from values
        ('IND', '{ "population": "1000k", "state": "KA" }'),                        
        ('IND', '{ "population": "2000k", "state": "AP" }'),                        
        ('IND', '{ "population": "3000K", "state": "TN" }'),                                
        ('US', '{ "population": "100k", "state": "Texas" }'),                         
        ('US', '{ "population": "200k", "state": "Florida" }'),                         
        ('US', '{ "population": "300K", "state": "Iowa" }')                     
)

This code is is almost exactly the same is demircioglu's answer, but has no ordering of the array content.

select id, array_agg(state_info) as stateinfo
from data
group by 1;

which because of the order of the input still appears ordered. But it is really random, it depends if you need the data ordered or not:

ID	STATEINFO
US	[ { "population": "100k", "state": "Texas" }, { "population": "200k", "state": "Florida" }, { "population": "300K", "state": "Iowa" } ]
IND	[ { "population": "1000k", "state": "KA" }, { "population": "2000k", "state": "AP" }, { "population": "3000K", "state": "TN" } ]