I need to know the xml matching code to let me clean a file of all comments least the ones containing the HELP word.
I am currently using perl to clean multiline comments (like failed tests) from a xml, least the ones I hand picked and are helpful to the end user:
perl -i -w -0777pe 's/<!--(.(?<!(HELP|TODO)))*?-->//sg' somefile.xml
But, If there is a way to obtain the same result using a xml mathing string, I would prefer as there may have some exception that regex may not handle, but for now this is what I have to use.
Obs.: I will use it with xmlstarlet on linux, so it would be better if the solution work with it too.
CodePudding user response:
Here is XSLT based solution.
The XSLT is very simple. One single line template removes not needed comments. The rest is just a boilerplate code for a so called Identity Transform pattern.
$ xml tr path/to/transformation.xslt path/to/source.xml
Input XML
<?xml version="1.0"?>
<root>
<!--The Sunshine State-->
<state>FL</state>
<!--HELP is needed-->
<city>Miami</city>
</root>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!--keeps comment if it contains HELP-->
<xsl:template match="comment()[not(contains(., 'HELP'))]" />
</xsl:stylesheet>
xmlstarlet
xmlstarlet ed -d '//comment()[not(contains(.,"HELP"))]' path/to/source.xml
Output
<?xml version='1.0' encoding='utf-8' ?>
<root>
<state>FL</state>
<!--HELP is needed-->
<city>Miami</city>
</root>