How to combine previously grouped rows by one or more shared rows?-CodePudding

In my PostgreSQL database, I have two separate queries (q1, q2) joining across multiple tables assigning the same items to different groups (I call these subgroups) based on different criteria. I get query result 1 and 2 (qr1, qr2).

An item might appear in one or both, but within a result it is unique. I want to assign a new group id based on both subgroups and assigning the same group id if the subgroups share one or more items.

qr1                           qr2
 ---------- -------------      ---------- ------------- 
| item     | subgroup1    |   | item     | subgroup2   |
 ---------- -------------      ---------- ------------- 
|       1  |           1 |    |        1 |           5 |
|       2  |           1 |    |        5 |           6 |
|       3  |           2 |    |        6 |           6 |
|       4  |           3 |    |        7 |           7 |
|       5  |           3 |    |        8 |           5 |
|       6  |           4 |    |       10 |           5 |
 ---------- -------------      ---------- -------------

combined (interested in item and group):
 --------- ------------ ------------ ------- 
| item    | subgroup1  | subgroup2  | group |
 --------- ------------ ------------ ------- 
|  1      | 1          | 5          | 1     |
|  2      | 1          | N          | 1     |
|  3      | 2          | N          | 2     |
|  4      | 3          | N          | 3     |
|  5      | 3          | 6          | 3     |
|  6      | 4          | 6          | 3     |
|  7      | N          | 7          | 4     |
|  8      | N          | 5          | 1     |
| 10      | N          | 5          | 1     |
 --------- ------------ ------------ -------

Subgroup 1 and 5 would be combined because of shared item 1. (group items 1, 2, 8, 10)
Subgroup 3, 4! and 6 would be combined because of shared item 5. (group items 4, 5, 6)
Subgroup 2 and 7 constitute their own group and do not share an item with any other group.

I tried to use window functions, count by duplicate item, and going from there. But I got stuck.

CodePudding user response：

As mentioned in the request comments, you need a recursive query. The recursive part is what I call cte im my query. There is an array called items that I use to avoid cycles.

The idea is per item I assign all other items that are directly or indirectly associated the same group. Then I aggregate by item and take the smallest associated item and thus detect all items belonging to the same group. I use DENSE_RANK to get consecutive group numbers.

with recursive
  qr1(item, subgroup) as (values (1,1), (2,1), (3,2), (4,3), (5,3), (6,4)),
  qr2(item, subgroup) as (values (1,5), (5,6), (6,6), (7,7), (8,5), (10,5)),
  qr(item, subgroup) as (select * from qr1 union all select * from qr2),
  cte(item, other, items) as
  (
    select item, item, array[item]
    from qr
    union all
    select cte.item, g.item, cte.items || g.item
    from cte
    join qr on qr.item = cte.other
    join qr g on g.subgroup = qr.subgroup
    where g.item <> all (cte.items)
  )
select
  item, 
  min(qr1.subgroup) as sg1,
  min(qr2.subgroup) as sg2,
  dense_rank() over (order by min(other)) as grp
from cte
left join qr1 using (item)
left join qr2 using (item)
group by item
order by item;

Demo: https://dbfiddle.uk/fG6AnX6l