Home > database >  Why does this Regex fail when a file has \r\n terminators at the end of each line?
Why does this Regex fail when a file has \r\n terminators at the end of each line?

Time:09-11

I'm having problems trying to understand why my Regex is wonky in some code of mine and I'd love to have someone out there tell me what bit of Regexy mystery I'm missing!

Basically I have a file that I'm trying to look into to see if it contains a Regex string that I need to further process. To do this I load the file, then do a match and then I'm handling what I find after.

The issue I'm getting is this was working great for most of my files, expect one: And this file differs from all the others only insofar as it has \r\n as the line terminators and not \r (took me a dogs age to grok that one!)

Here's what I got:

File contains:

/***************************************
File:               dim.Warehouse.sql
Dependencies:       dim.Currencies.sql
****************************************/
Select Blah, Blah, Blah
  From A.Table

Code

var regexStr = @"^([ \t]*Dependencies[\t ]*:[\t ]*)(([\\*?#\w.] \.sql[\t ]*) )$";
var re = new Regex(regexStr, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline, TimeSpan.FromSeconds(10));
var fileContents1 = File.ReadAllText("/Users/rambler/tmp/Quick/ThisFile.sql");
var fileContents2 = File.ReadAllText("/Users/rambler/tmp/Quick/ThisFile.sql").Replace("\r", "");

var m1 = re.Matches(fileContents1);
var m2 = re.Matches(fileContents2);

Why does m1.Count = 0 and m2.Count = 1 here? What do I need to do with my Regex to handle a file with both \r\n and \n line terminators, since doing it this way feels awfully kludgy...

CodePudding user response:

From the documentation, you need to handle the carriage return \r yourself.

If you specify the RegexOptions.Multiline option, it matches either the newline character (\n) or the end of the input string. It does not, however, match the carriage return/line feed character combination. To successfully match them, use the subexpression \r?$ instead of just $.

  • Related