I want to write a case statement which can extract value for a particular substring from a column named details
which has multiple occurrences for [address]
is it possible with REGEX along with case?
sample Data in the column:
[address]:kattengat 28
[address]:1012 SZ
[address]: Amsterdam
The below SQL only outputs:
kattengat 28
Select case when to_number(instr(details),'[address')>0 then substr(details,REGEXP_INSTR(details,'address',1,1) 8,instr(substr(details,REGEXP_INSTR(details,'address',1,1) 8,'[')-1)) else '' end from table_name;
Expected output is :
kattengat 28
1012 SZ
Amsterdam
Create table statement:
Create table test (id number(10), details clob);
Insert statement :
insert into test (id, details) values (1,to_clob ('[ADDRESS ] kattengat 28
[NAME ] ALEX
[PHONE ] 65438
[ADDRESS ] 1012 SZ
[DOB ] 1st Jan 1998
[ADDRESS ] Amsterdam')):
Please note I don't want to concat and add statements rather looking for a solution which can extract values associated with the substring [address] based on the number of occurrences of the substring
CodePudding user response:
Here's one option:
SQL> with test (col) as
2 (select '[address]:kattengat 28
3 [address]:1012 SZ
4 [address]: Amsterdam' from dual
5 )
6 select trim(replace(regexp_substr(replace(col, chr(10), '#'), '[^#] ', 1, column_value), '[address]:', '')) result
7 from test cross join
8 table(cast(multiset(select level from dual
9 connect by level <= regexp_count(col, ':')
10 ) as sys.odcinumberlist));
RESULT
--------------------------------------------------------------------------------
kattengat 28
1012 SZ
Amsterdam
SQL>
What does it do?
- lines #1 - 5 - sample data
- line #6:
regexp_substr
part of code is responsible for splitting source column value into separate rows- it affects not the original value, but the one whose new line character (chr(10)) is replaced (the 2nd
replace
) by#
, and that character is used as a separator forregexp_substr
- the 1st
replace
removes[address]:
from the source trim
removes leading/trailing empty strings (as the one in front of "Amsterdam"- lines #8 - 10 are here to remove duplicate values from the final result (if source table doesn't contain only one row, as in my example). If it actually does, then code can be somewhat simplified.
With sample data you posted later (by the way, are you sure there are spaces in front of [NAME] etc.? I guess NOT!):
SQL> select * from test;
ID DETAILS
---------- --------------------------------------------------
1 [ADDRESS ] kattengat 28
[NAME ] ALEX
[PHONE ] 65438
[ADDRESS ] 1012 SZ
[DOB ] 1st Jan 1998
[ADDRESS ] Amsterdam
Code I previously posted, slightly modified because previously address
was in lower case, there were NO spaces within square brackets, and there was a colon sign):
SQL> with temp as
2 (select trim(replace(regexp_substr(replace(details, chr(10), '#'), '[^#] ', 1, column_value), '[ADDRESS ]', '')) result
3 from test cross join
4 table(cast(multiset(select level from dual
5 connect by level <= regexp_count(details, '\[')
6 ) as sys.odcinumberlist))
7 )
8 select *
9 from temp
10 where instr(result, '[') = 0;
RESULT
--------------------------------------------------------------------------------
kattengat 28
1012 SZ
Amsterdam
SQL>