Home > Software engineering >  Oracle REGEXP_SUBSTR - SEMICOLON STRING EXTRACTION
Oracle REGEXP_SUBSTR - SEMICOLON STRING EXTRACTION

Time:06-24

I have below input and need mentioned output. How I can get it. I tried different pattern but could not get through it.

so in brief, any value having all three 1#2#3 parts(if it is present) or first value should be returned

2#9#;2#37#65 ->  2#37#65
2#9#;2#37#65;2#37#  -> 2#37#65 
2#9#;2#37#65;2#37#;2#37#56  -> 2#37#65 or 2#37#56
2#37#65;2#99  -> 2#37#65
3#9#;3#37#65;3#37#36;2#37#56  -> 3#37#65 or 3#37#36 or 2#37#56
2#37#;2#99#  -> 2#37 or 2#99# ( in this case any value)

I tried few patterns and other pattern but no help.

    regexp_substr('2#9#;2#37#65;2#37#','#[^;] ',1)
    SUBSTR(REGEXP_SUBSTR(SUBSTR(uo_filiere,1,INSTR(uo_filiere,';',1)-1), '#[^#] $'),2)

CodePudding user response:

You can use a REGEXP_REPLACE here:

REGEXP_REPLACE(uo_filiere, '^(.*;)?([0-9] (#[0-9] ){2,}).*|^([^;] ).*', '\2\4')

See the regexp demo

Details:

  • ^ - start of string
  • (.*;)? - an optional Group 1 capturing any text and then a ;
  • ([0-9] (#[0-9] ){2,}) - Group 2 (\2): one or more digits, and then two or more occurrences of # followed with one or more digits
  • .* - the rest of the string
  • | - or
  • ^([^;] ).* - start of string, Group 4 capturing one or more chars other than ; and then any text till end of string.

The replacement is Group 2 Group 4 values.

CodePudding user response:

Here is a simple-minded way to solve this. It may prove more efficient than other approaches, given the particular nature of the problem.

First, use a regular expression to find the first token that has all three parts. This part of the solution should be the most efficient approach for those strings that do have a three-part token, and it performs work that must be performed on all input strings in any case.

In the second part, wrap within nvl - if no three-part token is found, select the first token regardless of how many parts are present. This part uses only substr and instr in a trivial manner, so it should be very fast too.

Here's the query, run on a few more sample inputs to test those cases too.

with
  sample_data (uo_filiere) as (
    select '2#9#;2#37#65'                 from dual union all
    select '2#9#;2#37#65;2#37#'           from dual union all
    select '2#9#;2#37#65;2#37#;2#37#56'   from dual union all
    select '2#37#65;2#99'                 from dual union all
    select '3#9#;3#37#65;3#37#36;2#37#56' from dual union all
    select '2#37#;2#99#'                  from dual union all
    select '1#22#333'                     from dual union all
    select '33#444#'                      from dual
  )
select uo_filiere,
       nvl(regexp_substr(uo_filiere, '(;|^)(([^;#] #){2}[^;] )', 1, 1, null, 2)
          , substr(uo_filiere, 1, instr(uo_filiere || ';', ';') - 1)
          ) as first_value
from   sample_data
;

UO_FILIERE                    FIRST_VALUE                 
----------------------------  ----------------------------
2#9#;2#37#65                  2#37#65                     
2#9#;2#37#65;2#37#            2#37#65                     
2#9#;2#37#65;2#37#;2#37#56    2#37#65                     
2#37#65;2#99                  2#37#65                     
3#9#;3#37#65;3#37#36;2#37#56  3#37#65                     
2#37#;2#99#                   2#37#                       
1#22#333                      1#22#333                    
33#444#                       33#444#  
  • Related