Home > Back-end >  Duplicate values when splitting a string
Duplicate values when splitting a string

Time:01-11

I'm trying to create a row for each person, str but I am getting extra output.

Can someone please explain what I did wrong and show me how to fix it.

Below is my test CASE and expected results. Thanks to all who answer and your expertise.


with rws as (
  select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL 
  select 'Jane' person, 'AB' str from dual 
)
  select person,
       regexp_substr (
           str,
           '[^,] ',
           1,
           level
         ) value
  from   rws
  connect by level <= 
    length ( str ) - length ( replace ( str, ',' ) )   1
ORDER BY person, str;

PERSON  VALUE
Bob AB
Bob CR
Bob DE
Bob DE
Bob CR
Jane AB

Expected results 
PERSON  VALUE
Bob AB
Bob CR
Bob DE
Jane AB

CodePudding user response:

The problem with your original query is that connect-by is looking at previous rows more than once - essentially, the second level of rows for Bob is also picking up the first row for Jane. This is a fairly well-known issue. You can avoid that by including a unique ID (with this example you'd have to rely on the name, and hope it's unique); but that then will loop, which you can avoid by adding a non-deterministic function call:

...
  connect by level <= 
    length ( str ) - length ( replace ( str, ',' ) )   1
  and prior person = person
  and prior dbms_random.value is not null
ORDER BY person, str;

You could also use recursive subquery factoring instead of a hierarchical query:

with rws as (
  select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL 
  select 'Jane' person, 'AB' str from dual 
),
rcte (person, str, cnt, lvl, value) as (
  select person, str, length ( str ) - length ( replace ( str, ',' ) ), 1,
       regexp_substr (
           str,
           '[^,] ',
           1,
           1
         )
  from   rws
  union all
  select person, str, cnt, lvl   1,
       regexp_substr (
           str,
           '[^,] ',
           1,
           lvl   1
         )
  from   rcte
  where  lvl <= cnt
)
select person, value
from rcte
order by person, value;

fiddle

but you might find one of the other answers performs better, or at least is easy to understand and maintain.

Incidentally, your regular expression pattern might cause issues if you ever have a null element (i.e. two adjacent commas); this this answer for an explanation.

CodePudding user response:

Here's one option:

SQL> WITH
  2     rws
  3     AS
  4        (SELECT 'Bob' person, 'AB,CR,DE' str FROM DUAL
  5         UNION ALL
  6         SELECT 'Jane' person, 'AB' str FROM DUAL)
  7    SELECT person,
  8           REGEXP_SUBSTR (str,
  9                          '[^,] ',
 10                          1,
 11                          COLUMN_VALUE) VALUE
 12      FROM rws
 13           CROSS JOIN
 14           TABLE (
 15              CAST (
 16                 MULTISET (    SELECT LEVEL
 17                                 FROM DUAL
 18                           CONNECT BY LEVEL <= REGEXP_COUNT (str, ',')   1)
 19                    AS SYS.odcinumberlist))
 20  ORDER BY person, str;

PERS VALUE
---- --------
Bob  AB
Bob  CR
Bob  DE
Jane AB

SQL>

Your solution would return desired result if you applied SELECT DISTINCT (and fixed order by clause, but that's irrelevant), but that would also behave badly as number of rows you're working with grows.

SQL> with rws as (
  2    select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL
  3    select 'Jane' person, 'AB' str from dual
  4  )
  5    select distinct person,
  6         regexp_substr (
  7             str,
  8             '[^,] ',
  9             1,
 10             level
 11           ) value
 12    from   rws
 13    connect by level <=
 14      length ( str ) - length ( replace ( str, ',' ) )   1;

PERS VALUE
---- --------
Jane AB
Bob  CR
Bob  AB
Bob  DE

SQL>

CodePudding user response:

You can use a recursive query and simple string functions (which is slightly more to type but is faster than regular expressions):

with rws (person, str) as (
  select 'Bob', 'AB,CR,DE' from dual UNION ALL 
  select 'Jane', 'AB'      from dual 
),
bounds (person, str, spos, epos) AS (
  SELECT person,
         str,
         1,
         INSTR(str, ',', 1)
  FROM   rws
UNION ALL
  SELECT person,
         str,
         epos   1,
         INSTR(str, ',', epos   1)
  FROM   bounds
  WHERE  epos > 0
)
SELECT person,
       CASE epos
       WHEN 0
       THEN SUBSTR(str, spos)
       ELSE SUBSTR(str, spos, epos - spos)
       END AS value
FROM   bounds
ORDER BY person, value;

Which outputs:

PERSON VALUE
Bob AB
Bob CR
Bob DE
Jane AB

fiddle

CodePudding user response:

If you don't have quotes in the data, for 12c you may use JSON_TABLE and lateral join instead of recursion.

with rws as (
  select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL 
  select 'Jane' person, 'AB' str from dual union all
  select 'Mark', null from dual
)
select
  rws.person,
  l.val_splitted,
  l.rn
from rws
  left join lateral (
    select *
    from json_table(
      '["' || replace(rws.str, ',', '","') || '"]',
      '$[*]'
      columns (
        val_splitted varchar2(10) path '$',
        rn for ordinality
      )
    )
  ) l
  on 1 = 1
order by 1
PERSON VAL_SPLITTED RN
Bob AB 1
Bob CR 2
Bob DE 3
Jane AB 1
Mark 1
  • Related