Home > Net >  Why is this .net regex failing in freespace mode?
Why is this .net regex failing in freespace mode?

Time:01-06

I am trying to use a regex I made in freespace mode, so I can add comments and keep it maintainable for the future. The program below is a sample of the problem, but the actual regex where I ran into this problem is much more complex.

        private static readonly string regexTest1 = @"(?((?=A))(A)|(\w))";
        private static readonly string regexTest2 = @"(?x)
        (?        # If ...
        ((?=A))   # next character is A
        (A)|      # Capture in group 1, else ...
        (\w))     # Capture in group 2 (and EndIf).
        ";

        static void Main(string[] args)
        {
            Match m1 = new Regex(regexTest1).Match("A");
            Match m2 = new Regex(regexTest2).Match("A");  // Exception!
        }

trying to use regexTest2, the program breaks with a "token not recognized" exception. It seems to be the conditional '(?' and the end parenthesis three lines further that's causing the problem, but why can't they be split up over several lines?

As a work around I'm using a concatenated string like this ...

private static readonly string regexTest2 =
@"(?"          // If ...
 @"((?=A))"    // next character is A
 @"(A)|"       // Capture in group 1, else ...
 @"(\w))";     // Capture in group 2 (and EndIf).

But I find the @'s and "" distract too much from the regex parts. What are the restrictions in .NET for breaking up a regex this way? Is there another (clearer) way I'm overlooking?

CodePudding user response:

It looks like (? on its own line causes a problem for the parser. I don't know why, but it's reasonably easy to work round. I would personally use the explicit RegexOptions rather than the (?x), but both work:

using System.Text.RegularExpressions;

string pattern = @"
(?((?=A)) # If next character is A
(A)|      # Capture in group 1, else ...
(\w))     # Capture in group 2 (and EndIf).
";

Match match = new Regex(pattern, RegexOptions.IgnorePatternWhitespace).Match("A");

Or:

string pattern =
@"(?x)    # Ignore pattern whitespace
(?((?=A)) # If next character is A
(A)|      # Capture in group 1, else ...
(\w))     # Capture in group 2 (and EndIf).
";

Match match = new Regex(pattern).Match("A");

Note that while it would be more readable to write:

string pattern = @"
(?x) # Ignore pattern whitespace
...
";

... that won't work due to the whitespace (line break) before the option. It doesn't throw an exception (it's a valid regex) but it doesn't match as you want it to.

If you definitely want to separate the outermost grouping construct from the first one, you can give it a name:

using System.Text.RegularExpressions;

string pattern = @"
(?'outer' # If...
((?=A))   # next character is A
(A)|      # Capture in group 1, else ...
(\w))     # Capture in group 2 (and EndIf).
";

Match match = new Regex(pattern, RegexOptions.IgnorePatternWhitespace).Match("A");
  • Related