I need to remove all tags from xml document if a certain text is found.
Example:
<root-element>
<tag-name first:line="some-value">bla-bla</tag-name>
<tag-name second:line="some-value">bla-bla</tag-name>
<tag-name third:line="some-value">bla-bla</tag-name>
<tag-name first:line="some-value">bla-bla</tag-name>
<tag-name second:line="some-value">bla-bla</tag-name>
</root-element>
So for each first:line
into the XML document, I want to remove the whole tag.
CodePudding user response:
You'll need to use a xml parsing library.
I recommend lxml.
Then to build a xpath selector utilize a function string-length() on the text() property. This way it will select any element with text inside.
import lxml.etree as et
tree=et.fromstring(xml)
for bad in tree.xpath("//*[string-length(text()) > 0]"):
bad.getparent().remove(bad)
print(et.tostring(tree, pretty_print=True, xml_declaration=True))
CodePudding user response:
Here is how to do it via XSLT.
The XSLT is using a so called Identity Transform pattern.
I modified XML and removed bogus namespaces.
Input XML
<?xml version="1.0"?>
<root-element>
<tag-name firstline="some-value">bla-bla</tag-name>
<tag-name secondline="some-value">bla-bla</tag-name>
<tag-name thirdline="some-value">bla-bla</tag-name>
<tag-name firstline="some-value">bla-bla</tag-name>
<tag-name secondline="some-value">bla-bla</tag-name>
</root-element>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[@firstline]"/>
</xsl:stylesheet>
Output XML
<root-element>
<tag-name secondline="some-value">bla-bla</tag-name>
<tag-name thirdline="some-value">bla-bla</tag-name>
<tag-name secondline="some-value">bla-bla</tag-name>
</root-element>