I'm trying to create a row for each person, str but I am getting extra output.
Can someone please explain what I did wrong and show me how to fix it.
Below is my test CASE and expected results. Thanks to all who answer and your expertise.
with rws as (
select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL
select 'Jane' person, 'AB' str from dual
)
select person,
regexp_substr (
str,
'[^,] ',
1,
level
) value
from rws
connect by level <=
length ( str ) - length ( replace ( str, ',' ) ) 1
ORDER BY person, str;
PERSON VALUE
Bob AB
Bob CR
Bob DE
Bob DE
Bob CR
Jane AB
Expected results
PERSON VALUE
Bob AB
Bob CR
Bob DE
Jane AB
CodePudding user response:
The problem with your original query is that connect-by is looking at previous rows more than once - essentially, the second level of rows for Bob is also picking up the first row for Jane. This is a fairly well-known issue. You can avoid that by including a unique ID (with this example you'd have to rely on the name, and hope it's unique); but that then will loop, which you can avoid by adding a non-deterministic function call:
...
connect by level <=
length ( str ) - length ( replace ( str, ',' ) ) 1
and prior person = person
and prior dbms_random.value is not null
ORDER BY person, str;
You could also use recursive subquery factoring instead of a hierarchical query:
with rws as (
select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL
select 'Jane' person, 'AB' str from dual
),
rcte (person, str, cnt, lvl, value) as (
select person, str, length ( str ) - length ( replace ( str, ',' ) ), 1,
regexp_substr (
str,
'[^,] ',
1,
1
)
from rws
union all
select person, str, cnt, lvl 1,
regexp_substr (
str,
'[^,] ',
1,
lvl 1
)
from rcte
where lvl <= cnt
)
select person, value
from rcte
order by person, value;
but you might find one of the other answers performs better, or at least is easy to understand and maintain.
Incidentally, your regular expression pattern might cause issues if you ever have a null element (i.e. two adjacent commas); this this answer for an explanation.
CodePudding user response:
Here's one option:
SQL> WITH
2 rws
3 AS
4 (SELECT 'Bob' person, 'AB,CR,DE' str FROM DUAL
5 UNION ALL
6 SELECT 'Jane' person, 'AB' str FROM DUAL)
7 SELECT person,
8 REGEXP_SUBSTR (str,
9 '[^,] ',
10 1,
11 COLUMN_VALUE) VALUE
12 FROM rws
13 CROSS JOIN
14 TABLE (
15 CAST (
16 MULTISET ( SELECT LEVEL
17 FROM DUAL
18 CONNECT BY LEVEL <= REGEXP_COUNT (str, ',') 1)
19 AS SYS.odcinumberlist))
20 ORDER BY person, str;
PERS VALUE
---- --------
Bob AB
Bob CR
Bob DE
Jane AB
SQL>
Your solution would return desired result if you applied SELECT DISTINCT
(and fixed order by
clause, but that's irrelevant), but that would also behave badly as number of rows you're working with grows.
SQL> with rws as (
2 select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL
3 select 'Jane' person, 'AB' str from dual
4 )
5 select distinct person,
6 regexp_substr (
7 str,
8 '[^,] ',
9 1,
10 level
11 ) value
12 from rws
13 connect by level <=
14 length ( str ) - length ( replace ( str, ',' ) ) 1;
PERS VALUE
---- --------
Jane AB
Bob CR
Bob AB
Bob DE
SQL>
CodePudding user response:
You can use a recursive query and simple string functions (which is slightly more to type but is faster than regular expressions):
with rws (person, str) as (
select 'Bob', 'AB,CR,DE' from dual UNION ALL
select 'Jane', 'AB' from dual
),
bounds (person, str, spos, epos) AS (
SELECT person,
str,
1,
INSTR(str, ',', 1)
FROM rws
UNION ALL
SELECT person,
str,
epos 1,
INSTR(str, ',', epos 1)
FROM bounds
WHERE epos > 0
)
SELECT person,
CASE epos
WHEN 0
THEN SUBSTR(str, spos)
ELSE SUBSTR(str, spos, epos - spos)
END AS value
FROM bounds
ORDER BY person, value;
Which outputs:
PERSON | VALUE |
---|---|
Bob | AB |
Bob | CR |
Bob | DE |
Jane | AB |
CodePudding user response:
If you don't have quotes in the data, for 12c you may use JSON_TABLE
and lateral join instead of recursion.
with rws as ( select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL select 'Jane' person, 'AB' str from dual union all select 'Mark', null from dual ) select rws.person, l.val_splitted, l.rn from rws left join lateral ( select * from json_table( '["' || replace(rws.str, ',', '","') || '"]', '$[*]' columns ( val_splitted varchar2(10) path '$', rn for ordinality ) ) ) l on 1 = 1 order by 1
PERSON | VAL_SPLITTED | RN |
---|---|---|
Bob | AB | 1 |
Bob | CR | 2 |
Bob | DE | 3 |
Jane | AB | 1 |
Mark | 1 |