I get an xml string from a post request and I need to use this xml in a subsequent request. I need to edit the XML from the first request to reflect the correct format for the subsequent request. I can successfully remove the name spaces but am struggling with extracting the desired node and keeping the xml formatting.
current format
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<GetExResponse xmlns="http://www.someurl.com/">
<GetExResult>
<DataMap xmlns="" sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
</DataMap>
</GetExResult>
</GetExResponse>
</soap:Body>
</soap:Envelope>
Desired Format
<?xml version="1.0" encoding="UTF-8"?>
<DataMap xmlns="" sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
</DataMap>
--removes namespaces
dmXML = xmlstring
from lxml import etree
root = etree.fromstring(dmXML)
for elem in root.getiterator():
elem.tag = etree.QName(elem).localname
etree.cleanup_namespaces(root)
test = etree.tostring(root).decode()
print(test)
--extracts desired node but into dataframe changing the formatting
xdf = pandas.read_xml(dmXML, xpath='.//DataMap/*', namespaces={"doc": "http://www.w3.org/2001/XMLSchema"})
xml = pandas.DataFrame.to_xml(xdf)
CodePudding user response:
You can simply extract the relevant portion into a new document:
import xml.etree.ElementTree as ET
root = ET.fromstring(dmXML)
new_root = root.find('.//DataMap')
print(ET.tostring(new_root, xml_declaration=True, encoding='UTF-8').decode())
Output:
<?xml version='1.0' encoding='UTF-8'?>
<DataMap sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
</DataMap>