I have a table with a skills column having comma separated values. The skills can have duplicates per student. I need to use group by to list unique skills per student.
What should I be using so that I get list with no duplicates. Please help.
Input Table student_skills
:
STUDENT_ID | SEMESTER | SKILLS |
---|---|---|
101 | 1 | C, SQL |
101 | 2 | C, CPP |
102 | 1 | CPP, Java |
102 | 2 | Java, JavaScript |
Desired result:
STUDENT_ID | SKILLS |
---|---|
101 | C, SQL, CPP |
102 | CPP, Java, JavaScript |
SQL Queries to create table, insert data:
create table student_skills
(STUDENT_ID number(10),
SEMESTER varchar2(5),
SKILLS varchar2(50));
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (101, '1', 'C, SQL');
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (101, '2', 'C, CPP');
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (102, '1', 'CPP, Java');
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (102, '2', 'Java, JavaScript');
commit;
CodePudding user response:
One option is to split skills into rows (the temp
CTE), and then aggregate them back (line #11):
SQL> with temp as
2 (select distinct
3 student_id,
4 trim(regexp_substr(skills, '[^,] ', 1, column_value)) skill
5 from student_skills cross join
6 table(cast(multiset(select level from dual
7 connect by level <= regexp_count(skills, ',') 1
8 ) as sys.odcinumberlist))
9 )
10 select student_id,
11 listagg(skill, ', ') within group (order by skill) skills
12 from temp
13 group by student_id;
STUDENT_ID SKILLS
---------- ------------------------------
101 C, CPP, SQL
102 CPP, Java, JavaScript
SQL>
CodePudding user response:
You can use a subquery along with DISTINCT
clause while extracting the substrings seperated by comma, then apply LISTAGG()
function to recombine the pieces such as
WITH s AS
(
SELECT DISTINCT student_id, REGEXP_SUBSTR(skills,'[^ ,] ',1,level) AS skills
FROM student_skills
CONNECT BY level <= REGEXP_COUNT(skills,',') 1
AND PRIOR student_id = student_id
AND PRIOR sys_guid() IS NOT NULL
)
SELECT student_id, LISTAGG(skills,', ') WITHIN GROUP (ORDER BY 0) AS skills
FROM s
GROUP BY student_id