Home > Enterprise >  How to match regular expression starting exactly at a given index?
How to match regular expression starting exactly at a given index?

Time:01-07

With the .NET Regex class, is there any way to match a regular expression inside a string only if the match starts exactly at a specific character index?

Let's look at an example:

  • regular expression ab
  • input string: ababab

Now, I can search for matches for the regular expression (named expr in the following) in the input string, for instance, starting at character index 2:

var match = expr.Match("ababab", 2);
//  match ------------->XXab

This will be successful and return a match at index 2.

If I pass index 1, this will also be successful, pointing to the same occurrence as above:

var match = expr.Match("ababab", 1);
//  match ------------->X ab

Is there any efficient way to have the second test fail, because the match does not start exactly at the specified index?

Obviously, there are some work-arounds to this. As my string in which testing occurs might be ... "long" (think possibly 4 digit numbers of characters), I would, however, prefer to avoid the overhead that would presumably occur in all three cases one way or another:

# Work-Around Drawback
1 I could check the resulting match to see whether its Index property matches the supplied index. Matching throughout the entire string would still take place, at least until the first match is found (or the end of the string is reached).
2 I could prepend the start anchor ^ to my regular expression and always test just the substring starting at the specified index. As the string may be very long and I might be testing the same regex on multiple starting positions (but, again, only exactly on these), I am concerned about performance drawbacks from the frequent partial copying of the long string. (Ranges might be a way out here, but unfortunately, the Regex class cannot (yet?) be used to scan them.)
3 I could prepend "^.{#}" (with # being replaced with the character index to test) for each expression and match from the beginning, then fish out the actually interesting match with a capturing group. I need to test the same regex on multiple possible start positions throughout my input string. As each time, the number of skipped characters changes, that would mean compiling a new regex every time, rather than re-using the one that I have, which again feels somewhat unclean.

Lastly, the Match overload that accepts a maximum length to check in addition to the start index does not seem useful, as in my case, the regular expression is not fixed and may well include variable-length portions, so I have no idea about the expected length of a match in advance.

CodePudding user response:

It appears you can use the \G operator, \Gab pattern will allow you to match at the second index and will fail at the first one, see this C# demo:

Regex expr = new Regex(@"\Gab");
Console.WriteLine(expr.Match("ababab", 1)?.Success); // => False
        
Regex expr2 = new Regex(@"\Gab");
Console.WriteLine(expr2.Match("ababab", 2)?.Success); // => True

As per the documentation, \G operator matches like this:

The match must occur at the point where the previous match ended, or if there was no previous match, at the position in the string where matching started."

  • Related