FInding associated values from same column in SQL-CodePudding

I am trying to find assosiate sid for each sid. I was trying with self join but could not find the count for sid linked with associated sid.

Data :

gid sid
g1  s1
g1  s2
g2  s2
g2  s3
g2  s4
g3  s1
g3  s3
g4  s2
g4  s3
g4  s4
g2  s2

Expected Output:

sid associated_sid  freq of sid occuring with other sid
s1  s2              1
s1  s3              1
s2  s1              1
s2  s3, s3          2
s2  s4, s4          2
s3  s1              1
s3  s2, s2          2
s3  s4, s4          2
s4  s2, s2          2
s4  s3, s3          2

CodePudding user response：

Use a self join and aggregation with GROUP_CONCAT():

SELECT t1.sid,
       GROUP_CONCAT(t2.sid) associated_sid,
       COUNT(*) freq
FROM (SELECT DISTINCT gid, sid FROM tablename) t1 
INNER JOIN (SELECT DISTINCT gid, sid FROM tablename) t2
ON t2.gid = t1.gid AND t2.sid <> t1.sid
GROUP BY t1.sid, t2.sid
ORDER BY t1.sid, t2.sid;

Or, with a CTE (MySql version 8.0 ):

WITH cte AS (SELECT DISTINCT gid, sid FROM tablename)
SELECT t1.sid,
       GROUP_CONCAT(t2.sid) associated_sid,
       COUNT(*) freq
FROM cte t1 INNER JOIN cte t2
ON t2.gid = t1.gid AND t2.sid <> t1.sid
GROUP BY t1.sid, t2.sid
ORDER BY t1.sid, t2.sid;

See the demo.

CodePudding user response：

You are linking relations in two levels max. A more generic query can be done for three or more levels, but the following one works for two:

with
r as (
  select a.gid, a.sid, b.sid as aid
  from data a join data b on b.gid = a.gid and a.sid <> b.sid
),
c as (
  select x.*, y.gid as rgid from r x
  left join r y on y.gid <> x.gid and x.sid = y.sid and x.aid = y.aid
),
e as (
  select distinct *
  from c 
  where rgid is not null or rgid is null and not exists (
    select 1 from c d where d.sid = c.sid and d.aid = c.aid and rgid is not null
  )
)
select sid, group_concat(aid) as associated_sid, count(*) as freq
from e
group by sid, aid
order by sid, aid

Result:

 sid  associated_sid  freq 
 ---- --------------- ---- 
 s1   s2              1    
 s1   s3              1    
 s2   s1              1    
 s2   s3,s3           2    
 s2   s4,s4           2    
 s3   s1              1    
 s3   s2,s2           2    
 s3   s4,s4           2    
 s4   s2,s2           2    
 s4   s3,s3           2

See running example at db<>fiddle.