I have a string:
`<p onclick =" alert('abc') "style =" color: black "> text </p>`
I want to remove all Javascript like onclick, onchange, ...
leaving only HTML and CSS. is there any way to do this in C#? the only way I can think of is to remove each javascript tag from the string.
Input: <p onclick =" alert('abc') "style =" color: black "> text </p>
Output: <p "style =" color: black "> text </p>
CodePudding user response:
You can use HtmlSanitizer to remove the inline java script for provided HTML fragment.
For ex - the following code
var sanitizer = new HtmlSanitizer();
var html = @"<script>alert('xss')</script><div onl oad=""alert('xss')"""
@"style=""background-color: test"">Test<img src=""test.gif"""
@"style=""background-image: url(javascript:alert('xss')); margin: 10px""><p onclick =""alert('abc')"" style =""color: black"">text</p></div>";
var sanitized = sanitizer.Sanitize(html);
returns the output as
<div>Test<img src="test.gif" style="margin: 10px"><p style="color: rgba(0, 0, 0, 1)">text</p></div>
You can check this fiddle for more details.
CodePudding user response:
The best way is to use Html Agility Pack. I have linked tha page you need in its documentations.
Use it like this:
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var pNode = htmlDoc.DocumentNode.SelectSingleNode("//p");
pNode.Attributes.Remove("onclick");
Here is the fiddle.