Home > Software design >  E-mail Regex causing catastrophic backtracking error
E-mail Regex causing catastrophic backtracking error

Time:12-01

I am at the verge of loosing my mind over trying to fix an Email Regex i built:

It is almost perfect for what i need. It works in 99.9% of all cases.

But there is one case which causes a catastrophic backtracking error and i cannot fix my regex for it.

The "Email" causing a catastrophic backtrack error:

[email protected]@tester.co.ro

Yes, such emails do occur in the application i need this Regex for.

People enter multiple Emails in one field for some reason. I have no answer for why this occurs.

I need the Help of Wizards of Stack Overflow.

My Email Regex might block or not block some officially valid Emails but that is not the point here.

All i want is to fix the catastrophic backtracking Problem of my Regex. I do not want to change what it blocks or not. It works for what i need it to do.

Here is my Email Regex:

^[^\W_] \w*(?:[.-]\w*)*[^\W_] @[^\W_] (?:[.-]?\w*[^\W_] )*(?:\.[^\W_]{2,})$

How can i make this Regex fail quickly so it doesn't cause a catastrophic backtracking error.

Thank You very much.

CodePudding user response:

You can use

^(?!_)\w (?:[.-]\w )*(?<!_)@[^\W_] (?>[.-]?\w*[^\W_])*\.[^\W_]{2,}$

See the enter image description here

and from debuggex:

enter image description here

An ASCII only version:

^(?!_)[A-Za-z0-9_] (?:[.-][A-Za-z0-9_] )*(?<!_)@[A-Za-z0-9] (?>[.-]?[A-Za-z0-9_]*[A-Za-z0-9])*\.[A-Za-z0-9]{2,}$

CodePudding user response:

I would go with this one:

^[^\W_](?:[\w.-]*[^\W_])?@[^\W_](?:\w*[^\W_])?(?:[.-]\w*[^\W_])*\.[^\W_]{2,}$

Demo: https://regex101.com/r/bzVNd1/1

A comparison:

before: ^[^\W_]    \w*(?:[.-]\w*)*[^\W_]    @[^\W_]               (?:[.-]?\w*[^\W_] )*(?:\.[^\W_]{2,})$
after:  ^[^\W_] (?:[\w.-]*        [^\W_] )? @[^\W_] (?:\w*[^\W_])?(?:[.-] \w*[^\W_] )*   \.[^\W_]{2,} $
               ^   ^                    ^          ^      ^              ^   ^     ^  ^
               ^   ^                    ^          ^      ^              ^   ^     ^  No reason to use a group
               ^   ^                    ^          ^      ^              ^   ^     This quantifier was useless
               ^   ^                    ^          ^      ^              ^   If you want to match only letters before the possible [.-]
               ^   ^                    ^          ^      ^              Now, when you match [.-] there is no reason to make it optional
               ^   ^                    ^          ^      If you want to match only letters before the possible [.-]
               ^   ^                    ^          Now the   quantifier is useless
               ^   ^                    The   quantifier was useless
               ^   \w*(?:[.-]\w*)* seems to be equivalent to [\w.-]*
               The   quantifier was useless

If you want to restrict the range of \w:

^[A-Za-z0-9](?:[A-Za-z0-9_.-]*[A-Za-z0-9])?@[A-Za-z0-9](?:[A-Za-z0-9_]*[A-Za-z0-9])?(?:[.-][A-Za-z0-9_]*[A-Za-z0-9])*\.[A-Za-z0-9]{2,}$
  • Related