I want strip html from string with regular expression and while this regex works everywhere it does not work in .net I don't understand why.
using System;
public class Program
{
public static void Main()
{
var text = "FOO <span style=\"mso-bidi-font-size:11.0pt;\nmso-fareast-language:EN-US\"> BAR";
var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "");
Console.WriteLine(res);
}
}
CodePudding user response:
You're missing the correct Regex option:
var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "", RegexOptions.Singleline);
The reason you need this is because you have a newline (\n
) in your HTML. Singleline
will ensure that .
even matches newline characters.
Docs blurb:
Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). For more information, see the "Single-line Mode" section in the Regular Expression Options article.
CodePudding user response:
Try this:
System.Text.RegularExpressions.Regex.Replace(text, "<[^>]*>", "");
This will strip the html of your string.