Home > OS >  How to merge overlapped groups in Snowflake
How to merge overlapped groups in Snowflake

Time:07-22

I have a many-many relationship table, and I want to find the overlapped groups and merge them into one.

In the example below, user 2 is in groups 7 and 8, so groups 7 and 8 should be merged into one that contains the records 1, 2, 4. The merged group id can be either 7 or 8, it doesn't matter.

user_id group
1 7
2 7
2 8
4 8
5 9
6 9

I wish to see output like this:

user_id group
1 7
2 7
4 7
5 9
6 9

CodePudding user response:

I don't understand why the group ID for 4 should be 7. Beside there one record is (user_id, group)-->(4,7), then I see.

WITH data AS (        
    SELECT * FROM VALUES 
        (1,7),
        (2,7),
        (2,8), 
        (4,8),
        (5,9),
        (6,9)
        v(user_id, group_id)
)
select user_id, group_id from
( select
  user_id,
  group_id,
  row_number() over (partition by user_id order by group_id) as row_number
from data ) a where row_number = 1
order by user_id;

CodePudding user response:

One way:

select user_id, STRTOK(listagg(group, ', ') within group (ORDER BY user_id ),',',1)
from <table>
GROUP BY user_id ORDER BY user_id;

CodePudding user response:

This is similar to the one asked earlier, where-in grouping needs to be done to the top level in hierarchy.

The below query aggregates user_id based on group_id into array and then compares those arrays with each other. When two arrays match they both get same group id. Once arrays match and they have been assigned their parent group id based on minimum group value, we need to get the top of the hierarchy. There could also be multiple hierarchies in the data-set, so we set starting point of each hierarchy as NULL. Lastly, we use hierarchical query to get the final grouping.

with data(user_id,group_id) as (
  select * from values
  (1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
  (6,9),(70,8),(21,51),(22,51),(23,52),
  (24,51),(24,52),(25,26)
),cte_1 as
  (select group_id,array_agg(user_id) arr
  from data
  group by group_id
), cte_2 as
  (select c1.group_id g1, c2.group_id g2 ,
  c1.arr arr1, c2.arr arr2,
  case when arrays_overlap(arr1, arr2) then g1 end flag,
  min(flag) over (partition by g2) grp,
  case when g2 <> grp then grp end final_grp
  from cte_1 c1, cte_1 c2
), cte_3 as 
  (select distinct g2, connect_by_root g2 as parent from cte_2
  start with final_grp is null
  connect by final_grp = prior g2
  order by g2
), cte_4 as
  (select c3.parent, c1.arr
  from cte_1 c1 left join cte_3 c3
  where c1.group_id = c3.g2
) select distinct value, parent as final_group 
  from cte_4,
  lateral flatten(input=>arr)
  order by value;
  
VALUE FINAL_GROUP
1 7
2 7
4 7
5 7
6 7
21 51
22 51
23 51
24 51
25 26
70 7
  • Related