I am working on regex_substr
to get the following result in oracle
Test 1.Input: JOHN 10BC STUDENT Desired Output: JOHN
Test 2.Input: JOHN STUDENT Desired Output: JOHN
Test 3.Input: JOHN 10BC STU Desired Output: JOHN
Test 4.Input: JOHN 10BC TEACHER Desired Output:NULL
Test 5.Input: JOHN TEACHER Desired Output:NULL
Query:
Test 1:
select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s .*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Failed . Returned JOHN 10BC instead of JOHN
Test 2:
select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s .*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Passed. Returned JOHN
I modified the query as below by removing the ? in the second block
Test 1:select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s .*BC)\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Passed. Returned JOHN
Test 2:select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s .*BC)\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Failed. Returned Null instead of JOHN.
How to ignore the middle optional word 10BC and always return the first word as a substring for STUDENT using REGEXP_SUBSTR
. Thanks in Advance!!!
CodePudding user response:
Do you really need to check what's in the middle? Perhaps you could ignore that and work with the rest of the requirement (check for STU or STUDENT (as your sample data suggest) and fetch data according to that)?
Here are two options (result_1
and result_2
), see if any of these help.
Sample data:
SQL> with test (id, col) as
2 (select 1, 'JOHN 10BC STUDENT' from dual union all
3 select 2, 'JOHN STUDENT' from dual union all
4 select 3, 'JOHN 10BC STU' from dual union all
5 select 4, 'JOHN 10BC TEACHER' from dual union all
6 select 5, 'JOHN TEACHER' from dual
7 )
Query begins here:
8 select id,
9 col,
10 case when instr(col, 'STU') > 0 then substr(col, 1, instr(col, ' ') - 1)
11 else null
12 end result_1,
13 --
14 case when regexp_substr(col, '\w $') in ('STU', 'STUDENT') then
15 regexp_substr(col, '^\w ')
16 else null
17 end result_2
18 from test;
ID COL RESULT_1 RESULT_2
---------- ----------------- -------------------- --------------------
1 JOHN 10BC STUDENT JOHN JOHN
2 JOHN STUDENT JOHN JOHN
3 JOHN 10BC STU JOHN JOHN
4 JOHN 10BC TEACHER
5 JOHN TEACHER
SQL>
result_1
searchescol
for theSTU
string; if it is found, it returns the first word (from beginning ofcol
up to the first space character)result_2
checks whether the last word (but this time using regexp_substr, anchoring to the end$
ofcol
) is "STU" or "STUDENT" and returns the first word (anchored to the beginning^
ofcol
CodePudding user response:
Use non-greedy pattern for any-character match (.*
) by adding a question mark after it (.*?
) to prevent it from grabbing an optional part ((\s .*BC)?
):
with test (id, col) as ( select 1, 'JOHN 10BC STUDENT' from dual union all select 2, 'JOHN STUDENT' from dual union all select 3, 'JOHN 10BC STU' from dual union all select 4, 'JOHN 10BC TEACHER' from dual union all select 4, 'JSTUOHN 10BC TEACHER' from dual union all select 5, 'JOHN TEACHER' from dual ) select id, col, regexp_substr(col, '(.*?)(\s .*BC)?\sSTU(DENT)?',1,1,'i',1) as qwe from test
ID | COL | QWE -: | :------------------- | :---- 1 | JOHN 10BC STUDENT | JOHN 2 | JOHN STUDENT | JOHN 3 | JOHN 10BC STU | JOHN 4 | JOHN 10BC TEACHER | null 4 | JSTUOHN 10BC TEACHER | null 5 | JOHN TEACHER | null
db<>fiddle here