Home > OS >  how to remove all comments from a xml file least the ones containing the "HELP" word?
how to remove all comments from a xml file least the ones containing the "HELP" word?

Time:08-07

I need to know the xml matching code to let me clean a file of all comments least the ones containing the HELP word.

I am currently using perl to clean multiline comments (like failed tests) from a xml, least the ones I hand picked and are helpful to the end user:
perl -i -w -0777pe 's/<!--(.(?<!(HELP|TODO)))*?-->//sg' somefile.xml

But, If there is a way to obtain the same result using a xml mathing string, I would prefer as there may have some exception that regex may not handle, but for now this is what I have to use.

Obs.: I will use it with xmlstarlet on linux, so it would be better if the solution work with it too.

CodePudding user response:

Here is XSLT based solution.

The XSLT is very simple. One single line template removes not needed comments. The rest is just a boilerplate code for a so called Identity Transform pattern.

$ xml tr path/to/transformation.xslt path/to/source.xml

Input XML

<?xml version="1.0"?>
<root>
    <!--The Sunshine State-->
    <state>FL</state>
    <!--HELP is needed-->
    <city>Miami</city>
</root>

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="no"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!--keeps comment if it contains HELP-->
    <xsl:template match="comment()[not(contains(., 'HELP'))]" />
</xsl:stylesheet>

xmlstarlet
xmlstarlet ed -d '//comment()[not(contains(.,"HELP"))]' path/to/source.xml

Output

<?xml version='1.0' encoding='utf-8' ?>
<root>
  <state>FL</state>
  <!--HELP is needed-->
  <city>Miami</city>
</root>
  • Related