Im triying to remove some SOAP envelope tags from this XML document:
<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope">
<S:Body>
<ns2:promediosSipsaCityResponse xmlns:ns2="http://servicios.sipsa.co.gov.dane/">
<return>
<city>BARRANQUILLA</city>
<codProduct>1</codProduct>
<send>0</send>
<releaseDate>2020-03-18T00:00:00-05:00</releaseDate>
<creationDate>2020-03-18T14:00:01-05:00</creationDate>
<price>632</price>
<product>Ahuyama</product>
<regId>316989</regId>
</return>
<return>
<city>BARRANQUILLA</city>
<codProduct>2</codProduct>
<send>0</send>
<releaseDate>2020-03-18T00:00:00-05:00</releaseDate>
<creationDate>2020-03-18T14:00:01-05:00</creationDate>
<price>7733</price>
<product>Arveja verde en vaina</product>
<regId>316990</regId>
</return>
</ns2:promediosSipsaCiudadResponse>
</S:Body>
</S:Envelope>
So it would look like this:
<return>
<city>BARRANQUILLA</city>
<codProduct>1</codProduct>
<send>0</send>
<releaseDate>2020-03-18T00:00:00-05:00</releaseDate>
<creationDate>2020-03-18T14:00:01-05:00</creationDate>
<price>632</price>
<product>Ahuyama</product>
<regId>316989</regId>
</return>
<return>
<city>BARRANQUILLA</city>
<codProduct>2</codProduct>
<send>0</send>
<releaseDate>2020-03-18T00:00:00-05:00</releaseDate>
<creationDate>2020-03-18T14:00:01-05:00</creationDate>
<price>7733</price>
<product>Arveja verde en vaina</product>
<regId>316990</regId>
</return>
I tried to use ElementTree
library to navigate through the elements and just get the return
childrens but its not working:
doc = etree.parse('result.xml')
for ele in doc.findall('//return'):
parent = ele.getparent()
print(parent)
parent.remove()
doc.write('result2.xml', pretty_print=True)
Any feedback is welcome, thanks!
CodePudding user response:
Instead of modifying your original file, I would just create a new one and copy the relevant portions into it.
Notes: as mentioned in the comments, you need a root element for well formed xml. Also, the original xml in your question is not well formed (the opening ns2:promediosSipsaCityResponse
doesn't match its closing). But assuming these are fixed, you can do what you want with either ElementTree or lxml:
old = """<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope">
<S:Body>
<ns2:promediosSipsaCityResponse
[.... rest of your xml above...]
</ns2:promediosSipsaCityResponse>
</S:Body>
</S:Envelope>
"""
new = """<someroot></someroot>"""
With ElementTree:
old_doc = ET.fromstring(old)
new_doc = ET.fromstring(new)
for ret in old_doc.findall('.//return'):
new_doc.insert(1,ret)
print(ET.tostring(new_doc2).decode())
Similarly, with lxml:
old_doc = etree.XML(old)
new_doc = etree.XML(new)
for ret in old_doc.xpath('//return'):
new_doc.insert(1,ret)
print(etree.tostring(new_doc).decode())
The output should be your expected output.