Oracle | Group by to fetch unique comma separated values-CodePudding

I have a table with a skills column having comma separated values. The skills can have duplicates per student. I need to use group by to list unique skills per student.

What should I be using so that I get list with no duplicates. Please help.

Input Table student_skills:

STUDENT_ID	SEMESTER	SKILLS
101	1	C, SQL
101	2	C, CPP
102	1	CPP, Java
102	2	Java, JavaScript

Desired result:

STUDENT_ID	SKILLS
101	C, SQL, CPP
102	CPP, Java, JavaScript

SQL Queries to create table, insert data:

create table student_skills
(STUDENT_ID number(10),
SEMESTER varchar2(5),
SKILLS varchar2(50));

insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (101, '1', 'C, SQL');
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (101, '2', 'C, CPP');
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (102, '1', 'CPP, Java');
insert into student_skills (STUDENT_ID, SEMESTER, SKILLS)
values (102, '2', 'Java, JavaScript');
commit;

CodePudding user response：

One option is to split skills into rows (the temp CTE), and then aggregate them back (line #11):

SQL> with temp as
  2    (select distinct
  3            student_id,
  4            trim(regexp_substr(skills, '[^,] ', 1, column_value)) skill
  5     from student_skills cross join
  6       table(cast(multiset(select level from dual
  7                           connect by level <= regexp_count(skills, ',')   1
  8                          ) as sys.odcinumberlist))
  9    )
 10  select student_id,
 11         listagg(skill, ', ') within group (order by skill) skills
 12  from temp
 13  group by student_id;

STUDENT_ID SKILLS
---------- ------------------------------
       101 C, CPP, SQL
       102 CPP, Java, JavaScript

SQL>

CodePudding user response：

Life would be so much easier if your input data was in first normal form... The way it is, you must split the strings first. No doubt those comma separated strings were generated by aggregation in the first place; is it not possible to recover an earlier stage, where the skills were shown one per row?

Assuming you have no say in the matter, you must split the input strings first, before you can de-duplicate and aggregate again. One hack is to use JSON functions for this, for example:

select student_id, 
       listagg(distinct skill, ', ')
         within group (order by semester, ord) as skills
from   student_skills,
       json_table('["' || replace(skills, ', ', '","') || '"]', '$[*]'
         columns (
           skill path '$',
           ord   for ordinality
         )
       )
group  by student_id
order  by student_id
;

CodePudding user response：

You can use a subquery along with DISTINCT clause while extracting the substrings seperated by comma, then apply LISTAGG() function to recombine the pieces such as

WITH s AS
(
 SELECT DISTINCT student_id, REGEXP_SUBSTR(skills,'[^,] ',1,level) AS skills
   FROM student_skills
CONNECT BY level <= REGEXP_COUNT(skills,',') 1
    AND PRIOR student_id = student_id
    AND PRIOR sys_guid() IS NOT NULL 
)
SELECT student_id, LISTAGG(skills,', ') WITHIN GROUP (ORDER BY 0) AS skills
  FROM s
 GROUP BY student_id

Demo

CodePudding user response：

You can use XPATH functions to find the distinct values:

SELECT student_id,
       XMLQUERY(
         'string-join(distinct-values(//text()), ", ")'
         PASSING XMLTYPE(skills)
         RETURNING CONTENT
       ).getStringVal() AS skills
FROM   (
  SELECT student_id,
         '<b>'
         || LISTAGG('<a>' || REPLACE(skills, ', ', '</a><a>') || '</a>')
              WITHIN GROUP (ORDER BY semester)
         || '</b>' AS skills
  FROM   student_skills
  GROUP BY
         student_id
);

Which, for the sample data, outputs:

STUDENT_ID SKILLS

101 C, SQL, CPP

102 CPP, Java, JavaScript

db<>fiddle here