Home > Enterprise >  Remove xml tag using python if a certain text is found
Remove xml tag using python if a certain text is found

Time:12-16

I need to remove all tags from xml document if a certain text is found.

Example:

<root-element>
    <tag-name first:line="some-value">bla-bla</tag-name>
    <tag-name second:line="some-value">bla-bla</tag-name>
    <tag-name third:line="some-value">bla-bla</tag-name>
    <tag-name first:line="some-value">bla-bla</tag-name>
    <tag-name second:line="some-value">bla-bla</tag-name>
</root-element>

So for each first:line into the XML document, I want to remove the whole tag.

CodePudding user response:

You'll need to use a xml parsing library.

I recommend lxml.

Then to build a xpath selector utilize a function string-length() on the text() property. This way it will select any element with text inside.

import lxml.etree as et

tree=et.fromstring(xml)

for bad in tree.xpath("//*[string-length(text()) > 0]"):
  bad.getparent().remove(bad)   

print(et.tostring(tree, pretty_print=True, xml_declaration=True))

CodePudding user response:

Here is how to do it via XSLT.

The XSLT is using a so called Identity Transform pattern.

I modified XML and removed bogus namespaces.

Input XML

<?xml version="1.0"?>
<root-element>
    <tag-name firstline="some-value">bla-bla</tag-name>
    <tag-name secondline="some-value">bla-bla</tag-name>
    <tag-name thirdline="some-value">bla-bla</tag-name>
    <tag-name firstline="some-value">bla-bla</tag-name>
    <tag-name secondline="some-value">bla-bla</tag-name>
</root-element>

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[@firstline]"/>
</xsl:stylesheet>

Output XML

<root-element>
  <tag-name secondline="some-value">bla-bla</tag-name>
  <tag-name thirdline="some-value">bla-bla</tag-name>
  <tag-name secondline="some-value">bla-bla</tag-name>
</root-element>
  • Related