Home > database >  How can I remove image attributes in html code
How can I remove image attributes in html code

Time:09-23

I need to make an ASP.net C# function for removing all image attributes, except "src", "align", "alt" and "title". The function must only remove content inside image tags. The input is html used for displaying articles, where I need to clean up image attributes.

public static string FixImageAttributes(string html-string)
{
    // Remove all attribues in the html-string here, except: "src", "align", "alt" and "title".

    return html-string;
}

Example:

If function input (html-string) is this:

<html>
<body>
<div>
<h1>Some html here</h1>
<p><img align="right" title="" border="0" hspace="7" alt="" vspace="7" src="/upload/content/images/bla/bla/test.jpg"></p>
</div>
<div>
<h2>Lorem impum</h2>
<p><img src="/upload/content/test/blah/image.jpg" width="624" height="255" alt="Text here" title="Hello" border="0" vspace="0" hspace="0"></p>
</div>
</body>
</html>

The function output should be this:

<html>
<body>
<div>
<h1>Some html here</h1>
<p><img align="right" title="" alt="" src="/upload/content/images/bla/bla/test.jpg"></p>
</div>
<div>
<h2>Lorem impum</h2>
<p><img src="/upload/content/test/blah/image.jpg" alt="Text here" title="Hello"></p>
</div>
</body>
</html>

CodePudding user response:

You can use HtmlAgilityPack for this and write something like this:

public string RemoveAllAttributesFromEveryNode(string html)
{
    var htmlDocument = new HtmlAgilityPack.HtmlDocument();
    htmlDocument.LoadHtml(html);
    var filterList = new List<string>{"src", "align", "alt", "title"};
    
    foreach (var node in htmlDocument.DocumentNode.SelectNodes("//*"))
    {
       var toRemove = node.Attributes.Where(x => !filterList.Contains(x)).ToList();
       foreach (var attribute in toRemove)
       {
           attribute.Remove();
       }
    }

    html = htmlDocument.DocumentNode.OuterHtml;

    return html;
}

More about can be found here HtmlAgilityPack can be found here:

https://html-agility-pack.net/?z=codeplex

CodePudding user response:

I modified Ran Turner's answer a bit:

public static string RemoveAllAttributesFromImgNode(string html)
{
    var htmlDocument = new HtmlDocument();
    htmlDocument.LoadHtml(html);
    string[] filter  = { "src", "align", "alt", "title" };
    var nodes = htmlDocument.DocumentNode.SelectNodes("//img");
    foreach (var node in nodes)
    {
        var attributes = node.Attributes.Where(x => !filter.Contains(x.Name.ToString())).ToList();
        foreach (var attribute in attributes)
        {
            node.Attributes.Remove(attribute);
        }
    }
    html = htmlDocument.DocumentNode.OuterHtml;
    return html;
} 

Output from console:

Old HTML:

<html>
<body>
<div>
<h1>Some html here</h1>
<p>
<img align="right" title="" border="0" hspace="7" alt="" vspace="7" src="/upload/content/images/bla/bla/test.jpg">
</p>
</div>
<div>
<h2> Lorem impum</h2 >
<p>
<img src="/upload/content/test/blah/image.jpg" width="624" height="255" alt="Text here" title ="Hello" border="0" vspace="0" hspace="0">
</p>
</div>
</body>
</html>

===============================

New HTML:

<html>
<body>
<div>
<h1>Some html here</h1>
<p>
<img align="right" title="" alt="" src="/upload/content/images/bla/bla/test.jpg">
</p>
</div>
<div>
<h2> Lorem impum</h2>
<p>
<img src="/upload/content/test/blah/image.jpg" alt="Text here" title="Hello">
</p>
</div>
</body>
  • Related