Home > Back-end >  Removing nodes from in an XML file using Python
Removing nodes from in an XML file using Python

Time:09-08

Im triying to remove some SOAP envelope tags from this XML document:

<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope">
  <S:Body>
    <ns2:promediosSipsaCityResponse xmlns:ns2="http://servicios.sipsa.co.gov.dane/">
      <return>
        <city>BARRANQUILLA</city>
        <codProduct>1</codProduct>
        <send>0</send>
        <releaseDate>2020-03-18T00:00:00-05:00</releaseDate>
        <creationDate>2020-03-18T14:00:01-05:00</creationDate>
        <price>632</price>
        <product>Ahuyama</product>
        <regId>316989</regId>
      </return>
      <return>
        <city>BARRANQUILLA</city>
        <codProduct>2</codProduct>
        <send>0</send>
        <releaseDate>2020-03-18T00:00:00-05:00</releaseDate>
        <creationDate>2020-03-18T14:00:01-05:00</creationDate>
        <price>7733</price>
        <product>Arveja verde en vaina</product>
        <regId>316990</regId>
      </return>
    </ns2:promediosSipsaCiudadResponse>
  </S:Body>
</S:Envelope>

So it would look like this:

<return>
 <city>BARRANQUILLA</city>
 <codProduct>1</codProduct>
 <send>0</send>
 <releaseDate>2020-03-18T00:00:00-05:00</releaseDate>
 <creationDate>2020-03-18T14:00:01-05:00</creationDate>
 <price>632</price>
 <product>Ahuyama</product>
 <regId>316989</regId>
</return>
<return>
 <city>BARRANQUILLA</city>
 <codProduct>2</codProduct>
 <send>0</send>
 <releaseDate>2020-03-18T00:00:00-05:00</releaseDate>
 <creationDate>2020-03-18T14:00:01-05:00</creationDate>
 <price>7733</price>
 <product>Arveja verde en vaina</product>
 <regId>316990</regId>
</return>
    

I tried to use ElementTree library to navigate through the elements and just get the return childrens but its not working:

doc = etree.parse('result.xml')
for ele in doc.findall('//return'):
    parent = ele.getparent()
    print(parent)
    parent.remove()
doc.write('result2.xml', pretty_print=True)

Any feedback is welcome, thanks!

CodePudding user response:

Instead of modifying your original file, I would just create a new one and copy the relevant portions into it.

Notes: as mentioned in the comments, you need a root element for well formed xml. Also, the original xml in your question is not well formed (the opening ns2:promediosSipsaCityResponse doesn't match its closing). But assuming these are fixed, you can do what you want with either ElementTree or lxml:

old = """<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope">
  <S:Body>
    <ns2:promediosSipsaCityResponse 
    [.... rest of your xml above...]
    </ns2:promediosSipsaCityResponse>
  </S:Body>
</S:Envelope>
"""
new = """<someroot></someroot>"""

With ElementTree:

old_doc = ET.fromstring(old)
new_doc = ET.fromstring(new)

for ret in old_doc.findall('.//return'):
    new_doc.insert(1,ret)
print(ET.tostring(new_doc2).decode())

Similarly, with lxml:

old_doc = etree.XML(old)
new_doc = etree.XML(new)

for ret in old_doc.xpath('//return'):
    new_doc.insert(1,ret)
print(etree.tostring(new_doc).decode())

The output should be your expected output.

  • Related