Home > Mobile >  Regex in .net seems to not work correctly
Regex in .net seems to not work correctly

Time:07-13

I want strip html from string with regular expression and while this regex works everywhere it does not work in .net I don't understand why.

using System;
                    
public class Program
{
    public static void Main()
    {
        var text = "FOO <span style=\"mso-bidi-font-size:11.0pt;\nmso-fareast-language:EN-US\"> BAR";
        var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "");
        Console.WriteLine(res);
    }
}

CodePudding user response:

You're missing the correct Regex option:

var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "", RegexOptions.Singleline);

The reason you need this is because you have a newline (\n) in your HTML. Singleline will ensure that . even matches newline characters.

Docs blurb:

Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). For more information, see the "Single-line Mode" section in the Regular Expression Options article.

Docs

Try it online

CodePudding user response:

Try this:

System.Text.RegularExpressions.Regex.Replace(text, "<[^>]*>", "");

This will strip the html of your string.

  • Related