I have a project
table which contains columns studentid
, projectid
. The students worked in group to complete the project and a students is allowed to participate in more than one project. I wish to query on how many students have partnered up together on 10 or more project.
studentid | projectid |
---|---|
1 | abx |
2 | abx |
3 | abx |
1 | abn |
4 | abn |
How do I do a nested query to find out each unique studentid
have occurrence with other studentid
for more than 10 times?
CodePudding user response:
First self-join the table in order to get the students that worked together. Then aggregate and count.
In order not to look at the same pair twice (s1/s2 <=> s2/s1), I don't join on s1 <> s2, but on s1 < s2.
select t1.studentid, t2.studentid, count(*)
from mytable t1
join mytable t2 on t2.projectid = t1.projectid and t2.studentid > t1.studentid
group by t1.studentid, t2.studentid
having count(*) >= 10;
CodePudding user response:
We do a self-join on project and group by pairs of students.
Where student < student avoids having the same student with themself and the same pairs of students both ways around.
The icing on the cake is a list of projects togehter using group_concat (use string_agg in SQL server.
create table project ( student int, project varchar(10));
insert into project values (1, 'abx'), (2, 'abx'), (3, 'abx'), (4, 'abx'), (1, 'abn'), (4, 'abn');
select group_concat(a.project) projects, a.student student_1, b.student student_2, count(a.project) Number_projects from project a join project b on a.project = b.project where a.student < b.student group by a.student, b.student;
projects | student_1 | student_2 | Number_projects :------- | --------: | --------: | --------------: abx | 1 | 2 | 1 abx | 1 | 3 | 1 abn,abx | 1 | 4 | 2 abx | 2 | 3 | 1 abx | 2 | 4 | 1 abx | 3 | 4 | 1
db<>fiddle here