Home > Enterprise >  Solve Catastrophic Backtracking in my regex detecting Email
Solve Catastrophic Backtracking in my regex detecting Email

Time:07-15

I have regex

/^\w ([.-]?\w )*@\w ([.-]?\w )*(\.\w{2,4}) $/

for checking valid Email. It works, but GitHub's code scanner shows this error

This Part of the Regular Expression May Cause Exponential Backtracking on Strings Starting With 'A@a' and Containing Many Repetitions of 'A'.

I got the error, however, I'm not sure how to solve it.

CodePudding user response:

A good place to start is this: How can I recognize an evil regex?

As one of the answers there says, the key is to avoid "repetition of a repetition". For instance, given (\w )* and the input aaa, it could match as (aaa), or (a)(aa), or (aa)(a), or (a)(a)(a); and as the input gets longer, the number of possibilities goes up exponentially. If instead you just write (\w*), it will match all the same strings, but only in one way.

In your case, you have two places where you write ([.-]?\w )* and because you've made the [.-] optional, it can match in all the ways that (\w )* can. But text without a dot or dash is already matched by the \w just before, so you can have ([.-]\w )* instead.

The string .aaa can now only match one way, because (.a)(aa) doesn't have a dot or dash at the start of the second group. Other strings like aaa or ..a can be ruled out because you need exactly one dot or dash, and at least one character matching \w (which doesn't include . or -).

  • Related