I have a table with a skills column having comma separated values. The skills can have duplicates per student. I need to use group by to list unique skills per student.
What should I be using so that I get list with no duplicates. Please help.
Input Table student_skills
:
STUDENT_ID | SEMESTER | SKILLS |
---|---|---|
101 | 1 | C, SQL |
101 | 2 | C, CPP |
102 | 1 | CPP, Java |
102 | 2 | Java, JavaScript |
Desired result:
STUDENT_ID | SKILLS |
---|---|
101 | C, SQL, CPP |
102 | CPP, Java, JavaScript |
SQL Queries to create table, insert data:
create table student_skills
(STUDENT_ID number(10),
SEMESTER varchar2(5),
SKILLS varchar2(50));
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (101, '1', 'C, SQL');
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (101, '2', 'C, CPP');
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (102, '1', 'CPP, Java');
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (102, '2', 'Java, JavaScript');
commit;
CodePudding user response:
One option is to split skills into rows (the temp
CTE), and then aggregate them back (line #11):
SQL> with temp as
2 (select distinct
3 student_id,
4 trim(regexp_substr(skills, '[^,] ', 1, column_value)) skill
5 from student_skills cross join
6 table(cast(multiset(select level from dual
7 connect by level <= regexp_count(skills, ',') 1
8 ) as sys.odcinumberlist))
9 )
10 select student_id,
11 listagg(skill, ', ') within group (order by skill) skills
12 from temp
13 group by student_id;
STUDENT_ID SKILLS
---------- ------------------------------
101 C, CPP, SQL
102 CPP, Java, JavaScript
SQL>
CodePudding user response:
Life would be so much easier if your input data was in first normal form... The way it is, you must split the strings first. No doubt those comma separated strings were generated by aggregation in the first place; is it not possible to recover an earlier stage, where the skills were shown one per row?
Assuming you have no say in the matter, you must split the input strings first, before you can de-duplicate and aggregate again. One hack is to use JSON functions for this, for example:
select student_id,
listagg(distinct skill, ', ')
within group (order by semester, ord) as skills
from student_skills,
json_table('["' || replace(skills, ', ', '","') || '"]', '$[*]'
columns (
skill path '$',
ord for ordinality
)
)
group by student_id
order by student_id
;
CodePudding user response:
You can use a subquery along with DISTINCT
clause while extracting the substrings seperated by comma, then apply LISTAGG()
function to recombine the pieces such as
WITH s AS
(
SELECT DISTINCT student_id, REGEXP_SUBSTR(skills,'[^,] ',1,level) AS skills
FROM student_skills
CONNECT BY level <= REGEXP_COUNT(skills,',') 1
AND PRIOR student_id = student_id
AND PRIOR sys_guid() IS NOT NULL
)
SELECT student_id, LISTAGG(skills,', ') WITHIN GROUP (ORDER BY 0) AS skills
FROM s
GROUP BY student_id
CodePudding user response:
You can use XPATH functions to find the distinct values:
SELECT student_id,
XMLQUERY(
'string-join(distinct-values(//text()), ", ")'
PASSING XMLTYPE(skills)
RETURNING CONTENT
).getStringVal() AS skills
FROM (
SELECT student_id,
'<b>'
|| LISTAGG('<a>' || REPLACE(skills, ', ', '</a><a>') || '</a>')
WITHIN GROUP (ORDER BY semester)
|| '</b>' AS skills
FROM student_skills
GROUP BY
student_id
);
Which, for the sample data, outputs:
STUDENT_ID SKILLS 101 C, SQL, CPP 102 CPP, Java, JavaScript
db<>fiddle here