Home > Back-end >  Remove namespaces and nodes from XML string in python
Remove namespaces and nodes from XML string in python

Time:10-12

I get an xml string from a post request and I need to use this xml in a subsequent request. I need to edit the XML from the first request to reflect the correct format for the subsequent request. I can successfully remove the name spaces but am struggling with extracting the desired node and keeping the xml formatting.


current format

<?xml version="1.0" encoding="UTF-8"?>

<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <GetExResponse xmlns="http://www.someurl.com/">
      <GetExResult>
        <DataMap xmlns="" sourceType="0">
          <FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
          <FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
        </DataMap>
      </GetExResult>
    </GetExResponse>
  </soap:Body>
</soap:Envelope>

Desired Format

<?xml version="1.0" encoding="UTF-8"?>
<DataMap xmlns="" sourceType="0">
  <FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
  <FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
</DataMap>

--removes namespaces

dmXML = xmlstring

 from lxml import etree
    root = etree.fromstring(dmXML)

    for elem in root.getiterator():
        elem.tag = etree.QName(elem).localname
    etree.cleanup_namespaces(root)
    test = etree.tostring(root).decode()
print(test)

--extracts desired node but into dataframe changing the formatting

xdf = pandas.read_xml(dmXML, xpath='.//DataMap/*', namespaces={"doc": "http://www.w3.org/2001/XMLSchema"})
   xml = pandas.DataFrame.to_xml(xdf)

CodePudding user response:

You can simply extract the relevant portion into a new document:

import xml.etree.ElementTree as ET
root = ET.fromstring(dmXML)
new_root = root.find('.//DataMap')
print(ET.tostring(new_root, xml_declaration=True, encoding='UTF-8').decode())

Output:

<?xml version='1.0' encoding='UTF-8'?>
<DataMap sourceType="0">
          <FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
          <FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
        </DataMap>
      
  • Related