Home > Net >  Oracle regexp_substr returning different result in case of same varchar
Oracle regexp_substr returning different result in case of same varchar

Time:07-14

Here is the varchar 50% REC.PES/50% PTT that I have in the RFDGENERIC table.

The task is split by /.

Here is the query

select
    regexp_substr(DESCR, '[^/] ', 1, level) value
from 
    RFDGENERIC 
where 
    id = 14966150
    connect by level <= length(DESCR) - length(replace(DESCR, '/'))   1
             and prior DESCR = DESCR
             and prior sys_guid() is not null;

But the result is.

50% REC.PES
50% PTT
50% PTT

If I will create the table and insert into it the same value from RFDGENERIC the result will be correct.

CREATE TABLE TEST_SPLIT
(
    ID NUMBER(20),
    DESCR VARCHAR2(128 char)
);

INSERT INTO TEST_SPLIT (id, DESCR)
    SELECT id, DESCR 
    FROM RFDGENERIC 
    WHERE DESCR = '50% REC.PES/50% PTT' 
      AND id = 14966150;

SELECT
    regexp_substr(DESCR, '[^/] ', 1, level) value
FROM
    TEST_SPLIT 
WHERE
    id = 14966150
      connect by level <= length(DESCR) - length(replace(DESCR, '/'))   1
             and prior DESCR = DESCR
             and prior sys_guid() is not null;

In this case the result is correct:

50% REC.PES
50% PTT

Notes:

  1. All data types are the same.

  2. I have used other methods as described here. Same result.

  3. DBMS:

     Oracle (ver. Oracle Database 19c Enterprise Edition Release 
     19.0.0.0.0 - Production Version 19.15.0.0.0)
    
     Case sensitivity: plain=upper, delimited=exact
     Driver: Oracle JDBC driver (ver. 21.5.0.0.0, JDBC4.3)
    

Here is the question what is the difference?

Why in the first case I am getting the wrong result?

CodePudding user response:

As @mathguy commented, you only get that result if you have two rows in the table with the same ID:

CREATE TABLE rfdgeneric (id, descr) AS
SELECT 14966150, '50% REC.PES/50% PTT' FROM DUAL UNION ALL
SELECT 14966150, '50% PTT' FROM DUAL;

Then your query outputs:

VALUE
50% PTT
50% REC.PES
50% PTT

Which is as it should for that sample data.

However, there are many methods of splitting strings and you can use simple string functions that are much faster than regular expressions:

WITH bounds (descr, spos, epos) AS (
  SELECT descr,
         1,
         INSTR(descr, '/', 1)
  FROM   rfdgeneric
  WHERE  id = 14966150
UNION ALL
  SELECT descr,
         epos   1,
         INSTR(descr, '/', epos   1)
  FROM   bounds
  WHERE  epos > 0
)
SELECT CASE epos
       WHEN 0
       THEN SUBSTR(descr, spos)
       ELSE SUBSTR(descr, spos, epos - spos)
       END AS descr
FROM   bounds;

Which has the same output:

DESCR
50% REC.PES
50% PTT
50% PTT

db<>fiddle here

CodePudding user response:

I get the same result using both queries in 21cXE; don't have 19c to try it on that version.


Anyway, perhaps you should switch to

SQL> with test_split (id, descr) as
  2    (select 14966150, '50% REC.PES/50% PTT' from dual)
  3  select regexp_substr(descr, '[^/] ', 1, column_value) val
  4  from test_split cross join
  5    table(cast(multiset(select level from dual
  6                        connect by level <= regexp_count(descr, '/')   1
  7                       ) as sys.odcinumberlist));

VAL
----------------------------------------------------------------------------
50% REC.PES
50% PTT

SQL>

which is different from yours as it takes care about potential duplicates in the table (you could "fix" that by adding the DISTINCT to your query, but that's not really a solution); also, it kind of simplifies the where clause (you don't need prior and sys_guid, unless I'm wrong).

  • Related