Remove duplicate from strings in sql-CodePudding

I have 2 cols ID Value

ab^bc^ab^de
mn^mn^op

I want the output as ID Value

ab^bc^de
mn^op

Can someone please help me in this.✋ I have around 500 rows in the table.

I tried using stuff and other ways but errors are popping up.

CodePudding user response：

You can use a recursive query and simple string functions (which is much faster than regular expressions, but a little more to type) to split the string and then, in later Oracle versions, can re-aggregate it using LISTAGG(DISTINCT ...:

WITH bounds ( rid, value, spos, epos ) AS (
  SELECT ROWID, value, 1, INSTR(value, '^', 1)
  FROM   table_name
UNION ALL
  SELECT rid, value, epos   1, INSTR(value, '^', epos   1)
  FROM   bounds
  WHERE  epos > 0
)
SELECT LISTAGG(
         DISTINCT 
         CASE epos
         WHEN 0
         THEN SUBSTR(value, spos)
         ELSE SUBSTR(value, spos, epos - spos)
         END,
         '^'
       ) WITHIN GROUP (ORDER BY spos) AS unique_values
FROM   bounds
GROUP BY rid;

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'ab^bc^ab^de' FROM DUAL UNION ALL
SELECT 'mn^mn^op' FROM DUAL UNION ALL
SELECT 'ab^bc^ab^de' FROM DUAL UNION ALL
SELECT 'one^two^three^one^two^one^four' FROM DUAL;

Outputs:

UNIQUE_VALUES
ab^bc^de
mn^op
ab^bc^de
one^two^three^four

If you are using earlier versions or Oracle that do not support DISTINCT in the LISTAGG then you can aggregate twice:

WITH bounds ( rid, value, spos, epos ) AS (
  SELECT ROWID, value, 1, INSTR(value, '^', 1)
  FROM   table_name
UNION ALL
  SELECT rid, value, epos   1, INSTR(value, '^', epos   1)
  FROM   bounds
  WHERE  epos > 0
),
words (rid, word, spos) AS (
  SELECT rid,
         CASE epos
         WHEN 0
         THEN SUBSTR(value, spos)
         ELSE SUBSTR(value, spos, epos - spos)
         END,
         spos
  FROM   bounds
),
unique_words ( rid, word, spos ) AS (
  SELECT rid,
         word,
         MIN(spos)
  FROM   words
  GROUP BY rid, word
)
SELECT LISTAGG(word, '^') WITHIN GROUP (ORDER BY spos) AS unique_values
FROM   unique_words
GROUP BY rid;

Which gives the same output.

fiddle

CodePudding user response：

For example:

Sample data:

SQL> with
  2  test (col) as
  3    (select 'ab^bc^ab^de' from dual union all
  4     select 'mn^mn^op'    from dual
  5    ),

Split values into rows:

  6  temp as
  7    (select
  8        col,
  9        regexp_substr(col, '[^\^] ', 1, column_value) val,
 10        column_value lvl
 11     from test cross join
 12       table(cast(multiset(select level from dual
 13                           connect by level <= regexp_count(col, '\^')   1
 14                          ) as sys.odcinumberlist))
 15    )

Aggregate them back, using only distinct values:

 16  select col,
 17         listagg(val, '^') within group (order by lvl) as result
 18  from (select col, val, min(lvl) lvl
 19        from temp
 20        group by col, val
 21       )
 22  group by col;

COL         RESULT
----------- --------------------
ab^bc^ab^de ab^bc^de
mn^mn^op    mn^op

SQL>

CodePudding user response：

Other solutions if your ORACLE version is recent enough to have LISTAGG DISTINCT:

with data(s) as (
    select 'ab^bc^ab^de' from dual union all
    select 'mn^mn^op' from dual
),
splitted(s, l, r) as (
    select s, level, regexp_substr(s,'[^\^] ',1,level) from data
    connect by regexp_substr(s,'[^\^] ',1,level) is not null and s = prior s and prior sys_guid() is not null
)
select s, listagg(distinct r, '^') within group(order by l) as r  from splitted
group by s
;

And better if you have a PK, use it:

with data(id, s) as (
    select 1, 'ab^bc^ab^de' from dual union all
    select 2, 'mn^mn^op' from dual
),
splitted(id, l, r) as (
    select id, level, regexp_substr(s,'[^\^] ',1,level) from data
    connect by regexp_substr(s,'[^\^] ',1,level) is not null and id = prior id and prior sys_guid() is not null
)
select id, listagg(distinct r, '^') within group(order by l) as r  from splitted
group by id
;