I need to group users so that they are grouped together if they have the same name or email. That is, to combine them into one group it is not necessary that everyone has the same e-mail or name, it is enough that the name or e-mail coincides with at least one other user. Example in the picture
CodePudding user response:
Here is a solution with a recursive query. For every user ID I go from related row to related row and remember all IDs I already looked up in the chain (ids) and the maximum user ID (maxid, which will finally become my group key).
with cte (userid, otherid, name, email, maxid, ids) as
(
select
userid, userid, name, email, userid, '#' cast(userid as varchar(max)) '#'
from mytable
union all
select
cte.userid, t.userid, t.name, t.email,
case when t.userid > cte.userid then t.userid else cte.userid end,
cte.ids cast(t.userid as varchar(max)) '#'
from cte
join mytable t on cte.ids not like '%#' cast(t.userid as varchar) '#%'
and (t.name = cte.name or t.email = cte.email)
)
select userid, dense_rank() over (order by max(maxid)) as grp
from cte
group by userid
order by max(maxid), userid;
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=826abe47fbc996696472731937f9a733
CodePudding user response:
Not the answer you probably want to hear, but I don't think this query can be written in SQL Server. It requires the engine to implement UNION
in recursive CTEs, but SQL Server doesn't [yet] implement it.
This can be done in PostgreSQL, however, as shown below:
with recursive
n as (
select name as r, id, name, email from t
union -- this is what's missing in SQL Server
select n.r, t.*
from n
join t on t.name = n.name or t.email = n.email
),
m as(
select r, min(id) as mid
from n
group by r
),
l as (
select *, row_number() over() as g
from (select *, row_number() over(partition by mid order by r) as rn from m) x
where rn = 1
)
select n.id, n.name, n.email, l.g
from n
join l on l.r = n.r
order by n.id;
Result:
id name email g
--- ----- ------------------- -
1 mc [email protected] 1
2 cm [email protected] 2
3 cm [email protected] 2
4 dh [email protected] 3
5 wa [email protected] 1
6 mc [email protected] 1
7 cmu [email protected] 3
8 cc [email protected] 3
9 gt [email protected] 1
10 hm [email protected] 2
See example at db<>fiddle.