(For those who meet the same case, pls notice that this problem might be .net and C# specified. See answer of Wiktor below.)
Before asking this question, I've read many related questions (including this: Match linebreaks - \n or \r\n?), but none of those answers worked.
In my case, I want to remove all //comments in some code files. To deal with files in Mac, Unix, Windows, I need something to match text between // and /r, or /n, or /r/n.
Here is the test content of code file:
var text = "int rn = 0; //comment1.0\r\n"
"int r = 0; //comment2.\r"
"int n = 0; //comment3.\n"
"end";
var txt = RemoveLineEndComment();
And here is the regex(if you are not a C charper, just focus on the regex pls):
public static class CommentRemover
{
private static readonly Regex RegexRemoveLineEndComment =
new(@"\/\/.*$", RegexOptions.Multiline);
public static string RemoveLineEndComment(this string text)
{
var t = RegexRemoveLineEndComment.Match(text).Value;
return RegexRemoveLineEndComment.Replace(text, string.Empty);
}
}
What I need is txt = "int rn = 0; \r\nint r = 0; \rint n = 0; \nend". Here are regexes and corresponding results:
//.*$ => txt="int rn = 0; \nint r = 0; \nend" (int n = 0 is missing)
//.*(?=\r\n) => txt="int rn = 0; \r\nint r = 0; //comment2.\rint n = 0; //comment3.\nend" (comment2 and 3 are left)
//.*(?=\r?\n?) => txt="int rn = 0; \nint r = 0; \nend" (int n = 0 is missing)
//.*(?=(\r\n|\r|\n)) => txt="int rn = 0; \nint r = 0; \nend" (int n = 0 is missing)
//.*(?=[\r\n|\r|\n]) => txt="int rn = 0; \nint r = 0; \nend" (int n = 0 is missing) ...
Seems there is something wrong with \r and it cannot be identified. If I only work with \r\n, the regex "//.*(?=\r\n)" works fine for the test content bellow:
var text = "int rn = 0; //comment1.0\r\n"
"int r = 0; //comment2.\r\n"
"int n = 0; //comment3.\r\n"
"end";
Anyone help me out? thanks for any help.
CodePudding user response:
In .NET, the .
pattern matches carriage return (CR) chars. It matches any chars but an LF char.
Note there is no option or modifier to redefine this .
behavior.
Thus, you can use
var RegexRemoveLineEndComment = new Regex(@"//[^\r\n]*", RegexOptions.Multiline);
See the C# demo.
If you want to remove also whitespace before //
, add the \s*
(any whitespace) or [\p{Zs}\t]*
(horizontal whitespace) at the pattern start.