I am trying to find assosiate sid for each sid. I was trying with self join but could not find the count for sid linked with associated sid.
Data :
gid sid
g1 s1
g1 s2
g2 s2
g2 s3
g2 s4
g3 s1
g3 s3
g4 s2
g4 s3
g4 s4
g2 s2
Expected Output:
sid associated_sid freq of sid occuring with other sid
s1 s2 1
s1 s3 1
s2 s1 1
s2 s3, s3 2
s2 s4, s4 2
s3 s1 1
s3 s2, s2 2
s3 s4, s4 2
s4 s2, s2 2
s4 s3, s3 2
CodePudding user response:
Use a self join and aggregation with GROUP_CONCAT()
:
SELECT t1.sid,
GROUP_CONCAT(t2.sid) associated_sid,
COUNT(*) freq
FROM (SELECT DISTINCT gid, sid FROM tablename) t1
INNER JOIN (SELECT DISTINCT gid, sid FROM tablename) t2
ON t2.gid = t1.gid AND t2.sid <> t1.sid
GROUP BY t1.sid, t2.sid
ORDER BY t1.sid, t2.sid;
Or, with a CTE (MySql version 8.0 ):
WITH cte AS (SELECT DISTINCT gid, sid FROM tablename)
SELECT t1.sid,
GROUP_CONCAT(t2.sid) associated_sid,
COUNT(*) freq
FROM cte t1 INNER JOIN cte t2
ON t2.gid = t1.gid AND t2.sid <> t1.sid
GROUP BY t1.sid, t2.sid
ORDER BY t1.sid, t2.sid;
See the demo.
CodePudding user response:
You are linking relations in two levels max. A more generic query can be done for three or more levels, but the following one works for two:
with
r as (
select a.gid, a.sid, b.sid as aid
from data a join data b on b.gid = a.gid and a.sid <> b.sid
),
c as (
select x.*, y.gid as rgid from r x
left join r y on y.gid <> x.gid and x.sid = y.sid and x.aid = y.aid
),
e as (
select distinct *
from c
where rgid is not null or rgid is null and not exists (
select 1 from c d where d.sid = c.sid and d.aid = c.aid and rgid is not null
)
)
select sid, group_concat(aid) as associated_sid, count(*) as freq
from e
group by sid, aid
order by sid, aid
Result:
sid associated_sid freq
---- --------------- ----
s1 s2 1
s1 s3 1
s2 s1 1
s2 s3,s3 2
s2 s4,s4 2
s3 s1 1
s3 s2,s2 2
s3 s4,s4 2
s4 s2,s2 2
s4 s3,s3 2
See running example at db<>fiddle.