In my PostgreSQL database, I have two separate queries (q1, q2) joining across multiple tables assigning the same items to different groups (I call these subgroups) based on different criteria. I get query result 1 and 2 (qr1, qr2).
An item might appear in one or both, but within a result it is unique. I want to assign a new group id based on both subgroups and assigning the same group id if the subgroups share one or more items.
qr1 qr2
---------- ------------- ---------- -------------
| item | subgroup1 | | item | subgroup2 |
---------- ------------- ---------- -------------
| 1 | 1 | | 1 | 5 |
| 2 | 1 | | 5 | 6 |
| 3 | 2 | | 6 | 6 |
| 4 | 3 | | 7 | 7 |
| 5 | 3 | | 8 | 5 |
| 6 | 4 | | 10 | 5 |
---------- ------------- ---------- -------------
combined (interested in item and group):
--------- ------------ ------------ -------
| item | subgroup1 | subgroup2 | group |
--------- ------------ ------------ -------
| 1 | 1 | 5 | 1 |
| 2 | 1 | N | 1 |
| 3 | 2 | N | 2 |
| 4 | 3 | N | 3 |
| 5 | 3 | 6 | 3 |
| 6 | 4 | 6 | 3 |
| 7 | N | 7 | 4 |
| 8 | N | 5 | 1 |
| 10 | N | 5 | 1 |
--------- ------------ ------------ -------
- Subgroup 1 and 5 would be combined because of shared item 1. (group items 1, 2, 8, 10)
- Subgroup 3, 4! and 6 would be combined because of shared item 5. (group items 4, 5, 6)
- Subgroup 2 and 7 constitute their own group and do not share an item with any other group.
I tried to use window functions, count by duplicate item, and going from there. But I got stuck.
CodePudding user response:
As mentioned in the request comments, you need a recursive query. The recursive part is what I call cte im my query. There is an array called items that I use to avoid cycles.
The idea is per item I assign all other items that are directly or indirectly associated the same group. Then I aggregate by item and take the smallest associated item and thus detect all items belonging to the same group. I use DENSE_RANK
to get consecutive group numbers.
with recursive
qr1(item, subgroup) as (values (1,1), (2,1), (3,2), (4,3), (5,3), (6,4)),
qr2(item, subgroup) as (values (1,5), (5,6), (6,6), (7,7), (8,5), (10,5)),
qr(item, subgroup) as (select * from qr1 union all select * from qr2),
cte(item, other, items) as
(
select item, item, array[item]
from qr
union all
select cte.item, g.item, cte.items || g.item
from cte
join qr on qr.item = cte.other
join qr g on g.subgroup = qr.subgroup
where g.item <> all (cte.items)
)
select
item,
min(qr1.subgroup) as sg1,
min(qr2.subgroup) as sg2,
dense_rank() over (order by min(other)) as grp
from cte
left join qr1 using (item)
left join qr2 using (item)
group by item
order by item;