I am using Regex Substring to filter out values that have 'p' in the start and ends before '-'. p is followed by 6 digits.
My Code :
code,REGEXP_SUBSTR(CODE,'^[p][^-] ')
CODE |
REGEXP_SUBSTR(CODE,'^[P][^-] ') |
---|---|
p700401- | p700401 |
p791701- | p791701 |
100-,p788001-, | null |
This is the result , but I am struggling to handle cases like in 3rd Row.
100-,p788001-
Can Someone Please guide me to handle such cases
CodePudding user response:
For sample data you posted, this returns the result you wanted (i.e. take "p" followed by exactly 6 digits):
SQL> with test (code) as
2 (select 'p700401-' from dual union all
3 select 'p791701-' from dual union all
4 select '100-,p788001-,' from dual
5 )
6 select code,
7 regexp_substr(code, 'p\d{6}') result
8 from test;
CODE RESULT
-------------- --------------
p700401- p700401
p791701- p791701
100-,p788001-, p788001
SQL>
CodePudding user response:
If you want to match complete terms in a comma-delimited string then you can use:
SELECT code,
REGEXP_SUBSTR(code, '(^|,)(p\d{6})-(,|$)', 1, 1, NULL, 2) AS result
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name (code) as
SELECT 'p700401-' FROM DUAL UNION ALL
SELECT 'p791701-' FROM DUAL UNION ALL
SELECT '100-,p788001-,' FROM DUAL UNION ALL
SELECT '123-,p456789-xyz,p987654-' FROM DUAL UNION ALL
SELECT 'p111111-,p222222-not_this,p333333-,p444444-' FROM DUAL;
Outputs:
CODE | RESULT |
---|---|
p700401- | p700401 |
p791701- | p791701 |
100-,p788001-, | p788001 |
123-,p456789-xyz,p987654- | p987654 |
p111111-,p222222-not_this,p333333-,p444444- | p111111 |
Displaying multiple terms
If you want to remove the non-matching terms from the string then:
SELECT code,
LTRIM(
REGEXP_REPLACE(
',' || REPLACE(code, ',', ',,') || ',',
'((,p\d{6})-,)|,.*?,',
'\2'
),
','
) AS result
FROM table_name;
Which, outputs:
CODE | RESULT |
---|---|
p700401- | p700401 |
p791701- | p791701 |
100-,p788001-, | p788001 |
123-,p456789-xyz,p987654- | p987654 |
p111111-,p222222-not_this,p333333-,p444444- | p111111,p333333,p444444 |
And if you want to split the list into rows then:
SELECT t.code,
i.*
FROM (
SELECT code,
',' || REPLACE(code, ',', ',,') || ',' AS double_delims
FROM table_name
) t
INNER JOIN LATERAL (
SELECT LEVEL As item,
REGEXP_SUBSTR(double_delims, ',(p\d{6})-,|,(.*?),', 1, LEVEL, NULL, 1)
AS value
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(double_delims, ',(p\d{6})-,|,(.*?),')
) i
ON (i.value IS NOT NULL);
Which outputs:
CODE | ITEM | VALUE |
---|---|---|
p700401- | 1 | p700401 |
p791701- | 1 | p791701 |
100-,p788001-, | 2 | p788001 |
123-,p456789-xyz,p987654- | 3 | p987654 |
p111111-,p222222-not_this,p333333-,p444444- | 1 | p111111 |
p111111-,p222222-not_this,p333333-,p444444- | 3 | p333333 |
p111111-,p222222-not_this,p333333-,p444444- | 4 | p444444 |
CodePudding user response:
Right, my two cents is to use REGEXP_REPLACE()
:
CREATE TABLE tst (code) as
SELECT 'p700401-' FROM DUAL UNION ALL
SELECT 'p791701-' FROM DUAL UNION ALL
SELECT '100-,z123456' FROM DUAL UNION ALL
SELECT '100-,p788001-,' FROM DUAL UNION ALL
SELECT 'p788001-,100-' FROM DUAL UNION ALL
SELECT '123-,p456789-xyz,p987654-' FROM DUAL;
SELECT
code, REGEXP_REPLACE(REGEXP_REPLACE(code, '(p\d{6})-|.', '\1'), '(\d)(p)', '\1,\2') AS result
FROM tst
Resuls in:
CODE | RESULT |
---|---|
p700401- | p700401 |
p791701- | p791701 |
100-,z123456 | null |
100-,p788001-, | p788001 |
p788001-,100- | p788001 |
123-,p456789-xyz,p987654- | p456789,p987654 |
It's a nested statement due to the lack of support for handy regex syntax as per given link.
The 1st regex pattern is supposed to replace anything other than what you are after, see an online demo. The 2nd one is there to insert comma's back to seperate these values, see the demo.