Home > OS >  Regex Like for ORACLE with lookahead and negative lookahead
Regex Like for ORACLE with lookahead and negative lookahead

Time:09-01

I am working with an programm which uploads emailadresses to another programm - but it accepts emails only in one way:

i tried to write a reglular expression to filter out emailadresse which are not accepted

^(?:([A-Za-z0-9!#$%* -.=?~|`_^]{1,64})|(\"[A-Za-z0-9!#$%* -.=?~|`_^(){}<>@,;: \[\]]{1,64}\"))\@(?!\.)(?!\-)(?!.*\.$)(?!.*\.\.)([A-Za-z0-9.-]{1,61})\.([a-z]{2,10})$

The description says: username@domain

The at sign ('@') must be present and not first or last character. The length of the name can have up to and including 64 characters. The length of the domain can have up to and including 64 characters. All email addresses are forced to lowercase when the email is sent. Therefore any email addresses requiring uppercase will most likely not be delivered correctly by the ISP as we will have changed it to lowercase. username

Can contain:

  • A-Z
  • a-z
  • 0-9
  • ! # $ % * - . = ? ~ | ` _ ^

The entire name can be surrounded by double quotes (though this is not supported by many ISPs). In this case, the following additional characters are allowed between the quotes - ( ) { } < > @ , ; : [ ] (space)

domain

Can contain:

  • A-Z
  • a-z
  • 0-9

Cannot contain 2 or more consecutive periods Must contain at least 1 period Domain - Cannot begin or end with a period or dash

also the part with [] does not work

Thanks for your help.

CodePudding user response:

Oracle does not, natively, support non-capturing groups, look-ahead or look-behind in regular expressions.


However, if you have Java enabled in the database then you can compile a Java class:

CREATE AND COMPILE JAVA SOURCE NAMED RegexParser AS
import java.util.regex.Pattern;

public class RegexpMatch {
  public static int match(
    final String value,
    final String regex
  ){
    final Pattern pattern = Pattern.compile(regex);

    return pattern.matcher(value).matches() ? 1 : 0;
  }
}
/

And create a PL/SQL wrapper function:

CREATE FUNCTION regexp_java_match(value IN VARCHAR2, regex IN VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA NAME 'RegexpMatch.match( java.lang.String, java.lang.String ) return int';
/

and then you can use your regular expression (or any other regular expression that Java supports):

SELECT REGEXP_JAVA_MATCH(
         '[email protected]',
         '^(?:([A-Za-z0-9!#$%* -.=?~|`_^]{1,64})|(\"[A-Za-z0-9!#$%* -.=?~|`_^(){}<>@,;: \[\]]{1,64}\"))\@(?!\.)(?!\-)(?!.*\.$)(?!.*\.\.)([A-Za-z0-9.-]{1,61})\.([a-z]{2,10})$'
       ) AS match
FROM   DUAL

Which outputs:

MATCH
1

db<>fiddle here

CodePudding user response:

Your regular expression can be re-written into a format that Oracle supports as:

  • (?:) non-capturing group are not supported and should just be a () capturing group instead.
  • Look-ahead is not supported but you can rewrite the look-ahead patterns using character list so @(?!\.)(?!-)([A-Za-z0-9.-]{1,61}) can be rewritten as @[A-Za-z0-9][A-Za-z0-9.-]{0,60}.
  • The (?!.*\.$) look-ahead is redundant as the pattern ends with ([a-z]{2,10})$ and can never match a trailing ..
  • If you want to include ] and - in a character list then ] should be the first character and - the last in the set.
  • The only thing that cannot be implemented in an Oracle regular expression is simultaneously restricting the length of the post-@ segment and ensuring there are no .. double dots; to do that you need to check for one of those two conditions in a second regular expression.
SELECT REGEXP_SUBSTR(
         REGEXP_SUBSTR(
           '[email protected]',
           '^('
             -- Unquoted local-part
             || '[A-Za-z0-9!#$%* .=?~|`_^-]{1,64}'
           -- or
           || '|'
             -- Quoted local-part
             || '"[]A-Za-z0-9!#$%* .=?~|`_^(){}<>@,;: [-]{1,64}"'
           || ')@'
           -- Domains
           || '[A-Za-z0-9]([A-Za-z0-9.-]{0,60})?'
           -- Top-level domain
           || '\.[a-z]{2,10}$'
         ),
         -- Local-part
         '^([^"]*?|".*?")'
         || '@'
         -- Domains - exclude .. patterns
         || '([^.] \.) [a-z]{2,10}$'
       ) AS match
FROM   DUAL

Or, using POSIX character lists:

SELECT REGEXP_SUBSTR(
         REGEXP_SUBSTR(
           '[email protected]',
           '^('
             -- Unquoted local-part
             || '[[:alnum:]!#$%* .=?~|`_^-]{1,64}'
           -- or
           || '|'
             -- Quoted local-part
             || '"[][:alnum:]!#$%* .=?~|`_^(){}<>@,;: [-]{1,64}"'
           || ')@'
           -- Domains
           || '[[:alnum:]]([[:alnum:].-]{0,60})?'
           -- Top-level domain
           || '\.[[:lower:]]{2,10}$'
         ),
         -- Local-part
         '^([^"]*?|".*?")'
         || '@'
         -- Domains
         || '([^.] \.) [[:lower:]]{2,10}$'
       ) AS match
FROM   DUAL

Which both output:

MATCH
[email protected]

db<>fiddle here

  • Related