Oracle Regex Substr to ignore optional group-CodePudding

I am working on regex_substr to get the following result in oracle

Test 1.Input: JOHN 10BC STUDENT Desired Output: JOHN

Test 2.Input: JOHN  STUDENT     Desired Output: JOHN

Test 3.Input: JOHN 10BC STU     Desired Output: JOHN

Test 4.Input: JOHN 10BC TEACHER Desired Output:NULL

Test 5.Input: JOHN  TEACHER     Desired Output:NULL

Query:

Test 1:

select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s .*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;

Output: Failed . Returned JOHN 10BC instead of JOHN

Test 2:

select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s .*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;

Output: Passed. Returned JOHN

I modified the query as below by removing the ? in the second block

Test 1:select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s .*BC)\sSTU(DENT)?',1,1,'i',1) from dual; Output: Passed. Returned JOHN

Test 2:select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s .*BC)\sSTU(DENT)?',1,1,'i',1) from dual; Output: Failed. Returned Null instead of JOHN.

How to ignore the middle optional word 10BC and always return the first word as a substring for STUDENT using REGEXP_SUBSTR. Thanks in Advance!!!

CodePudding user response：

Do you really need to check what's in the middle? Perhaps you could ignore that and work with the rest of the requirement (check for STU or STUDENT (as your sample data suggest) and fetch data according to that)?

Here are two options (result_1 and result_2), see if any of these help.

Sample data:

SQL> with test (id, col) as
  2    (select 1, 'JOHN 10BC STUDENT'  from dual union all
  3     select 2, 'JOHN  STUDENT'      from dual union all
  4     select 3, 'JOHN 10BC STU'      from dual union all
  5     select 4, 'JOHN 10BC TEACHER'  from dual union all
  6     select 5, 'JOHN  TEACHER'      from dual
  7    )

Query begins here:

  8  select id,
  9    col,
 10    case when instr(col, 'STU') > 0 then substr(col, 1, instr(col, ' ') - 1)
 11         else null
 12    end result_1,
 13    --
 14    case when regexp_substr(col, '\w $') in ('STU', 'STUDENT') then
 15              regexp_substr(col, '^\w ')
 16         else null
 17    end result_2
 18  from test;

        ID COL               RESULT_1             RESULT_2
---------- ----------------- -------------------- --------------------
         1 JOHN 10BC STUDENT JOHN                 JOHN
         2 JOHN  STUDENT     JOHN                 JOHN
         3 JOHN 10BC STU     JOHN                 JOHN
         4 JOHN 10BC TEACHER
         5 JOHN  TEACHER

SQL>

result_1 searches col for the STU string; if it is found, it returns the first word (from beginning of col up to the first space character)
result_2 checks whether the last word (but this time using regexp_substr, anchoring to the end $ of col) is "STU" or "STUDENT" and returns the first word (anchored to the beginning ^ of col

CodePudding user response：

Use non-greedy pattern for any-character match (.*) by adding a question mark after it (.*?) to prevent it from grabbing an optional part ((\s .*BC)?):

with test (id, col) as (
  select 1, 'JOHN 10BC STUDENT'  from dual union all
  select 2, 'JOHN  STUDENT'      from dual union all
  select 3, 'JOHN 10BC STU'      from dual union all
  select 4, 'JOHN 10BC TEACHER'  from dual union all
  select 4, 'JSTUOHN 10BC TEACHER'  from dual union all
  select 5, 'JOHN  TEACHER'      from dual
)
select
  id,
  col,
  regexp_substr(col, '(.*?)(\s .*BC)?\sSTU(DENT)?',1,1,'i',1) as qwe
from test

ID | COL                  | QWE  
-: | :------------------- | :----
 1 | JOHN 10BC STUDENT    | JOHN 
 2 | JOHN  STUDENT        | JOHN 
 3 | JOHN 10BC STU        | JOHN 
 4 | JOHN 10BC TEACHER    | null 
 4 | JSTUOHN 10BC TEACHER | null 
 5 | JOHN  TEACHER        | null

db<>fiddle here