Home > Blockchain >  Oracle REGEXP_LIKE logical and matching of substrings in string
Oracle REGEXP_LIKE logical and matching of substrings in string

Time:03-15

I have a string containing codes like 'code1 code2 code3'. It should return the string if all codes entered are contained in the string.

For example:

select * from (
select 'avs cde jkl' code from dual)
where REGEXP_LIKE(code, 'REGEX-MAGIC') 

When the regex is now something like ^(?=.*\bjkl\b)(?=.*\bavs\b).*$ then it should return the code. But this syntax is not working for regex in oracle.

The logic is 'if all codes looked for are in the string (order does not matter), then return the code.'

I have researched and this would be achievable with a positive lookahead, but oracle does not support this as far as I know. I would search for one regex and not a construct like REGEXP_LIKE(...,..) and REGEXP_LIKE(...,..) and ....

The Oracle Version is 12c.

Any help would be appreciated!

CodePudding user response:

I'm not good at regex-magix, but - see if something like this helps.

This is a table that contains those codes:

SQL> select * from codes;

        ID CODE
---------- -----------
         1 avs cde jkl
         2 xyz avs

Query

  • splits every code into rows (t_split CTE)
  • does the same for the entered parameter (par_string) value (p_split CTE)
  • why? So that they could act as if they were rows in a table, and you can apply the MINUS set operator
  • if MINUS returns nothing, there's a match; otherwise it's a mismatch

SQL> with
  2  -- split code to rows
  3  t_split as
  4    (select id,
  5            code original_code,
  6            regexp_substr(code, '[^ ] ', 1, column_value) code
  7      from codes cross join
  8        table(cast(multiset(select level from dual
  9                            connect by level <= regexp_count(code, ' ')   1
 10                           ) as sys.odcinumberlist))
 11      where id = &&par_id
 12    ),
 13  -- split parameter to rows
 14  p_split as
 15    (select regexp_substr('&&par_string', '[^ ] ', 1, level) code
 16     from dual
 17     connect by level <= regexp_count('&&par_string', ' ')   1
 18    )
 19  --
 20  -- if all parameter's "pieces" of code are contained in CODE value, MINUS returns nothing
 21  -- so there's a match
 22  select distinct t.original_code,
 23         '&&par_string' par_string,
 24         case when (select count(*)
 25                    from (select code from t_split
 26                          minus
 27                          select code from p_split
 28                         )
 29                   ) = 0 then 'Match'
 30              else 'Mismatch'
 31         end result
 32  from t_split t
 33  where t.id = &&par_id;

Enter value for par_id: 1
Enter value for par_string: jkl avs cde

ORIGINAL_CO PAR_STRING  RESULT
----------- ----------- --------
avs cde jkl jkl avs cde Match

SQL> undefine par_string
SQL> /
Enter value for par_string: avs jkl www

ORIGINAL_CO PAR_STRING  RESULT
----------- ----------- --------
avs cde jkl avs jkl www Mismatch

SQL>

Depending on tool you use (this is SQL*Plus), you might need to replace && with a colon :; or, convert such a piece of code to a function.

CodePudding user response:

Oracle does not support look-ahead, look-behind or word boundaries in regular expressions.

If you have the sample data:

CREATE TABLE table_name (code) AS
SELECT 'avs cde jkl' FROM DUAL UNION ALL
SELECT 'avs cde'     FROM DUAL UNION ALL
SELECT 'jkl avs'     FROM DUAL UNION ALL
SELECT 'cde jkl'     FROM DUAL;

Option 1:

The simplest query is to not use regular expressions and to look for sub-string matches using multiple LIKE conditions:

SELECT code
FROM   table_name
WHERE  ' ' || code || ' ' LIKE '% avs %'
AND    ' ' || code || ' ' LIKE '% jkl %'

Which outputs:

CODE
avs cde jkl
jkl avs

Option 2:

You could use (slower) regular expressions with multiple REGEXP_LIKE conditions:

SELECT code
FROM   table_name
WHERE  REGEXP_LIKE(code, '(^| )avs( |$)')
AND    REGEXP_LIKE(code, '(^| )jkl( |$)')

Which outputs the same as above.

Option 3:

You could put the matches into a sub-query factoring clause and then use a LATERAL join:

WITH match_conditions (match) AS (
  SELECT 'avs' FROM DUAL UNION ALL
  SELECT 'jkl' FROM DUAL
)
SELECT code
FROM   table_name t
       CROSS JOIN LATERAL (
         SELECT 1
         FROM   match_conditions
         WHERE  ' ' || code || ' ' LIKE '% ' || match || ' %'
         HAVING COUNT(*) = (SELECT COUNT(*) FROM match_conditions)
       )

Which outputs the same as above.

Option 4:

If you really want a single regular expression then you can generate each permutation of the codes to match and concatenate them into a single regular expression:

SELECT code
FROM   table_name
WHERE  REGEXP_LIKE(
         code,
             '(^| )avs( | .*? )jkl( |$)' -- Permutation 1
         || '|(^| )jkl( | .*? )avs( |$)' -- Permutation 2
       )

Which outputs the same as above.

However, this is going to get problematic to maintain as the number of codes to match grows as, for 2 items there are 2 permutations but for 5 items there are 5! = 120 permutations.

Option 5:

You could declare a nested table collection:

CREATE TYPE string_list AS TABLE OF VARCHAR2(20);

Then split the string (again, you do not need slow regular expressions) and then compare it to a nested table:

WITH bounds (rid, code, spos, epos) AS (
  SELECT ROWID, code, 1, INSTR(code, ' ', 1)
  FROM   table_name
UNION ALL
  SELECT rid, code, epos   1, INSTR(code, ' ', epos   1)
  FROM   bounds
  WHERE  epos > 0
)
SEARCH DEPTH FIRST BY code SET order_rn
SELECT MAX(code) AS code
FROM   bounds
GROUP BY rid
HAVING string_list('avs', 'jkl') SUBMULTISET OF CAST(
         COLLECT(
           CAST(
             CASE epos
             WHEN 0
             THEN SUBSTR(code, spos)
             ELSE SUBSTR(code, spos, epos - spos)
             END
             AS VARCHAR2(20)
           )
         )
         AS string_list
       );

Depending on the client application you are using, you can pass the entire string_list('avs', 'jkl') collection in as a single bind variable that you can populate from an array. Java (and some languages built on top of Java) using an ODBC driver can do this; C# cannot directly but you can pass an associative array and convert it to a nested table collection with a helper function.

Which outputs the same as above.

db<>fiddle here

  • Related