I need to make an ASP.net C# function for removing all image attributes, except "src", "align", "alt" and "title". The function must only remove content inside image tags. The input is html used for displaying articles, where I need to clean up image attributes.
public static string FixImageAttributes(string html-string)
{
// Remove all attribues in the html-string here, except: "src", "align", "alt" and "title".
return html-string;
}
Example:
If function input (html-string) is this:
<html>
<body>
<div>
<h1>Some html here</h1>
<p><img align="right" title="" border="0" hspace="7" alt="" vspace="7" src="/upload/content/images/bla/bla/test.jpg"></p>
</div>
<div>
<h2>Lorem impum</h2>
<p><img src="/upload/content/test/blah/image.jpg" width="624" height="255" alt="Text here" title="Hello" border="0" vspace="0" hspace="0"></p>
</div>
</body>
</html>
The function output should be this:
<html>
<body>
<div>
<h1>Some html here</h1>
<p><img align="right" title="" alt="" src="/upload/content/images/bla/bla/test.jpg"></p>
</div>
<div>
<h2>Lorem impum</h2>
<p><img src="/upload/content/test/blah/image.jpg" alt="Text here" title="Hello"></p>
</div>
</body>
</html>
CodePudding user response:
You can use HtmlAgilityPack
for this and write something like this:
public string RemoveAllAttributesFromEveryNode(string html)
{
var htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml(html);
var filterList = new List<string>{"src", "align", "alt", "title"};
foreach (var node in htmlDocument.DocumentNode.SelectNodes("//*"))
{
var toRemove = node.Attributes.Where(x => !filterList.Contains(x)).ToList();
foreach (var attribute in toRemove)
{
attribute.Remove();
}
}
html = htmlDocument.DocumentNode.OuterHtml;
return html;
}
More about can be found here HtmlAgilityPack
can be found here:
https://html-agility-pack.net/?z=codeplex
CodePudding user response:
I modified Ran Turner's answer a bit:
public static string RemoveAllAttributesFromImgNode(string html)
{
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
string[] filter = { "src", "align", "alt", "title" };
var nodes = htmlDocument.DocumentNode.SelectNodes("//img");
foreach (var node in nodes)
{
var attributes = node.Attributes.Where(x => !filter.Contains(x.Name.ToString())).ToList();
foreach (var attribute in attributes)
{
node.Attributes.Remove(attribute);
}
}
html = htmlDocument.DocumentNode.OuterHtml;
return html;
}
Output from console:
Old HTML:
<html>
<body>
<div>
<h1>Some html here</h1>
<p>
<img align="right" title="" border="0" hspace="7" alt="" vspace="7" src="/upload/content/images/bla/bla/test.jpg">
</p>
</div>
<div>
<h2> Lorem impum</h2 >
<p>
<img src="/upload/content/test/blah/image.jpg" width="624" height="255" alt="Text here" title ="Hello" border="0" vspace="0" hspace="0">
</p>
</div>
</body>
</html>
===============================
New HTML:
<html>
<body>
<div>
<h1>Some html here</h1>
<p>
<img align="right" title="" alt="" src="/upload/content/images/bla/bla/test.jpg">
</p>
</div>
<div>
<h2> Lorem impum</h2>
<p>
<img src="/upload/content/test/blah/image.jpg" alt="Text here" title="Hello">
</p>
</div>
</body>