Home > Net >  Group users by at least one of two parameters
Group users by at least one of two parameters

Time:03-10

I need to group users so that they are grouped together if they have the same name or email. That is, to combine them into one group it is not necessary that everyone has the same e-mail or name, it is enough that the name or e-mail coincides with at least one other user. Example in the picture

example

CodePudding user response:

Here is a solution with a recursive query. For every user ID I go from related row to related row and remember all IDs I already looked up in the chain (ids) and the maximum user ID (maxid, which will finally become my group key).

with cte (userid, otherid, name, email, maxid, ids) as
(
  select
    userid, userid, name, email, userid, '#'   cast(userid as varchar(max))   '#'
  from mytable
  union all
  select
    cte.userid, t.userid, t.name, t.email, 
    case when t.userid > cte.userid then t.userid else cte.userid end,
    cte.ids   cast(t.userid as varchar(max))   '#'
  from cte
  join mytable t on cte.ids not like '%#'   cast(t.userid as varchar)   '#%'
                 and (t.name = cte.name or t.email = cte.email)
)
select userid, dense_rank() over (order by max(maxid)) as grp
from cte 
group by userid 
order by max(maxid), userid;

Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=826abe47fbc996696472731937f9a733

CodePudding user response:

Not the answer you probably want to hear, but I don't think this query can be written in SQL Server. It requires the engine to implement UNION in recursive CTEs, but SQL Server doesn't [yet] implement it.

This can be done in PostgreSQL, however, as shown below:

with recursive
n as (
  select name as r, id, name, email from t
 union -- this is what's missing in SQL Server
  select n.r, t.*
  from n
  join t on t.name = n.name or t.email = n.email
),
m as(
  select r, min(id) as mid
  from n
  group by r
),
l as (
  select *, row_number() over() as g
  from (select *, row_number() over(partition by mid order by r) as rn from m) x
  where rn = 1
)
select n.id, n.name, n.email, l.g
from n
join l on l.r = n.r
order by n.id;

Result:

 id  name  email               g 
 --- ----- ------------------- - 
 1   mc    [email protected]     1 
 2   cm    [email protected]  2 
 3   cm    [email protected]        2 
 4   dh    [email protected]       3 
 5   wa    [email protected]     1 
 6   mc    [email protected]       1 
 7   cmu   [email protected]       3 
 8   cc    [email protected]       3 
 9   gt    [email protected]     1 
 10  hm    [email protected]  2

See example at db<>fiddle.

  • Related