Home > front end >  Identify line end with Regex (.net and C#)
Identify line end with Regex (.net and C#)

Time:10-22

(For those who meet the same case, pls notice that this problem might be .net and C# specified. See answer of Wiktor below.)

Before asking this question, I've read many related questions (including this: Match linebreaks - \n or \r\n?), but none of those answers worked.

In my case, I want to remove all //comments in some code files. To deal with files in Mac, Unix, Windows, I need something to match text between // and /r, or /n, or /r/n.

Here is the test content of code file:

        var text = "int rn = 0; //comment1.0\r\n"  
                   "int r = 0; //comment2.\r"   
                   "int n = 0; //comment3.\n"   
                   "end";
        var txt = RemoveLineEndComment();

And here is the regex(if you are not a C charper, just focus on the regex pls):

public static class CommentRemover
{
    private static readonly Regex RegexRemoveLineEndComment =
        new(@"\/\/.*$", RegexOptions.Multiline);
    public static string RemoveLineEndComment(this string text)
    {
        var t = RegexRemoveLineEndComment.Match(text).Value;
        return RegexRemoveLineEndComment.Replace(text, string.Empty);
    }
}

What I need is txt = "int rn = 0; \r\nint r = 0; \rint n = 0; \nend". Here are regexes and corresponding results:

//.*$ => txt="int rn = 0; \nint r = 0; \nend" (int n = 0 is missing)

//.*(?=\r\n) => txt="int rn = 0; \r\nint r = 0; //comment2.\rint n = 0; //comment3.\nend" (comment2 and 3 are left)

//.*(?=\r?\n?) => txt="int rn = 0; \nint r = 0; \nend" (int n = 0 is missing)

//.*(?=(\r\n|\r|\n)) => txt="int rn = 0; \nint r = 0; \nend" (int n = 0 is missing)

//.*(?=[\r\n|\r|\n]) => txt="int rn = 0; \nint r = 0; \nend" (int n = 0 is missing) ...

Seems there is something wrong with \r and it cannot be identified. If I only work with \r\n, the regex "//.*(?=\r\n)" works fine for the test content bellow:

        var text = "int rn = 0; //comment1.0\r\n"  
                   "int r = 0; //comment2.\r\n"   
                   "int n = 0; //comment3.\r\n"   
                   "end";

Anyone help me out? thanks for any help.

CodePudding user response:

In .NET, the . pattern matches carriage return (CR) chars. It matches any chars but an LF char.

Note there is no option or modifier to redefine this . behavior.

Thus, you can use

var RegexRemoveLineEndComment =  new Regex(@"//[^\r\n]*", RegexOptions.Multiline);

See the C# demo.

If you want to remove also whitespace before //, add the \s* (any whitespace) or [\p{Zs}\t]* (horizontal whitespace) at the pattern start.

  • Related