Regular Expression Oracle SQL : handling multiple cases-CodePudding

I am using Regex Substring to filter out values that have 'p' in the start and ends before '-'. p is followed by 6 digits.

My Code :

 code,REGEXP_SUBSTR(CODE,'^[p][^-] ')

`CODE`	`REGEXP_SUBSTR(CODE,'^[P][^-] ')`
p700401-	p700401
p791701-	p791701
100-,p788001-,	null

This is the result , but I am struggling to handle cases like in 3rd Row.

100-,p788001-

Can Someone Please guide me to handle such cases

CodePudding user response：

For sample data you posted, this returns the result you wanted (i.e. take "p" followed by exactly 6 digits):

SQL> with test (code) as
  2    (select 'p700401-' from dual union all
  3     select 'p791701-' from dual union all
  4     select '100-,p788001-,' from dual
  5    )
  6  select code,
  7         regexp_substr(code, 'p\d{6}') result
  8  from test;

CODE           RESULT
-------------- --------------
p700401-       p700401
p791701-       p791701
100-,p788001-, p788001

SQL>

CodePudding user response：

If you want to match complete terms in a comma-delimited string then you can use:

SELECT code,
       REGEXP_SUBSTR(code, '(^|,)(p\d{6})-(,|$)', 1, 1, NULL, 2) AS result
FROM   table_name;

Which, for the sample data:

CREATE TABLE table_name (code) as
  SELECT 'p700401-' FROM DUAL UNION ALL
  SELECT 'p791701-' FROM DUAL UNION ALL
  SELECT '100-,p788001-,' FROM DUAL UNION ALL
  SELECT '123-,p456789-xyz,p987654-' FROM DUAL UNION ALL
  SELECT 'p111111-,p222222-not_this,p333333-,p444444-' FROM DUAL;

Outputs:

CODE	RESULT
p700401-	p700401
p791701-	p791701
100-,p788001-,	p788001
123-,p456789-xyz,p987654-	p987654
p111111-,p222222-not_this,p333333-,p444444-	p111111

Displaying multiple terms

If you want to remove the non-matching terms from the string then:

SELECT code,
       LTRIM(
         REGEXP_REPLACE(
           ',' || REPLACE(code, ',', ',,') || ',',
           '((,p\d{6})-,)|,.*?,',
           '\2'
         ),
         ','
       ) AS result
FROM   table_name;

Which, outputs:

CODE	RESULT
p700401-	p700401
p791701-	p791701
100-,p788001-,	p788001
123-,p456789-xyz,p987654-	p987654
p111111-,p222222-not_this,p333333-,p444444-	p111111,p333333,p444444

And if you want to split the list into rows then:

SELECT t.code,
       i.*
FROM   (
         SELECT code,
                ',' || REPLACE(code, ',', ',,') || ',' AS double_delims
         FROM   table_name
       ) t
       INNER JOIN LATERAL (
         SELECT LEVEL As item,
                REGEXP_SUBSTR(double_delims, ',(p\d{6})-,|,(.*?),', 1, LEVEL, NULL, 1)
                  AS value
         FROM   DUAL
         CONNECT BY LEVEL <= REGEXP_COUNT(double_delims, ',(p\d{6})-,|,(.*?),')
       ) i
       ON (i.value IS NOT NULL);

Which outputs:

CODE	ITEM	VALUE
p700401-	1	p700401
p791701-	1	p791701
100-,p788001-,	2	p788001
123-,p456789-xyz,p987654-	3	p987654
p111111-,p222222-not_this,p333333-,p444444-	1	p111111
p111111-,p222222-not_this,p333333-,p444444-	3	p333333
p111111-,p222222-not_this,p333333-,p444444-	4	p444444

fiddle

CodePudding user response：

Right, my two cents is to use REGEXP_REPLACE():

CREATE TABLE tst (code) as
  SELECT 'p700401-' FROM DUAL UNION ALL
  SELECT 'p791701-' FROM DUAL UNION ALL
  SELECT '100-,z123456' FROM DUAL UNION ALL
  SELECT '100-,p788001-,' FROM DUAL UNION ALL
  SELECT 'p788001-,100-' FROM DUAL UNION ALL
  SELECT '123-,p456789-xyz,p987654-' FROM DUAL;

SELECT
  code, REGEXP_REPLACE(REGEXP_REPLACE(code, '(p\d{6})-|.', '\1'), '(\d)(p)', '\1,\2') AS result
FROM tst

Resuls in:

CODE	RESULT
p700401-	p700401
p791701-	p791701
100-,z123456	null
100-,p788001-,	p788001
p788001-,100-	p788001
123-,p456789-xyz,p987654-	p456789,p987654

It's a nested statement due to the lack of support for handy regex syntax as per given link.

The 1st regex pattern is supposed to replace anything other than what you are after, see an online demo. The 2nd one is there to insert comma's back to seperate these values, see the demo.