Home > database >  cleanup_namespaces does not remove namespaces from XML
cleanup_namespaces does not remove namespaces from XML

Time:07-17

Here is my xml string

xml = '''
<exta>
<signature>This </signature>
<begin_date>2019-07-12T09:41:48.187</begin_date>
<ver>4</ver>
<maiden_bc>1549</maiden_bc>
<exta_id>12345</exta_id>
<nps_max_price xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <exta_id>72723</exta_id>
    <extended_datetime>2018-11-20T11:01:29.040</extended_datetime>
    <event_ind>E</event_ind>
    <maiden>12345</maiden>
    <patient_id>123</patient_id>
    <boss_id>123LHF</boss_id>
    <template_name/>
    <end_date>2019-01-01T00:00:00</end_date>
    <UYI_AMN xsi:nil="true"/>
    <dedt_bef_ATS xsi:nil="true"/>
    <form>W</form>
</nps_max_price>
</exta>
'''

I was using cleanup_namespaces to remove namespace from the xml string

from lxml import etree
root = etree.fromstring(xml)
for elem in root.getiterator():
    elem.tag = etree.QName(elem).localname

etree.cleanup_namespaces(root)
print(etree.tostring(root).decode())

This gives me :

<exta>
<signature>This </signature>
<begin_date>2019-07-12T09:41:48.187</begin_date>
<ver>4</ver>
<maiden_bc>1549</maiden_bc>
<exta_id>12345</exta_id>
<nps_max_price xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <exta_id>72723</exta_id>
    <extended_datetime>2018-11-20T11:01:29.040</extended_datetime>
    <event_ind>E</event_ind>
    <maiden>12345</maiden>
    <patient_id>123</patient_id>
    <boss_id>123LHF</boss_id>
    <template_name/>
    <end_date>2019-01-01T00:00:00</end_date>
    <UYI_AMN xsi:nil="true"/>
    <dedt_bef_ATS xsi:nil="true"/>
    <form>W</form>
</nps_max_price>
</exta>

However the expected output was xml to not to have the namespaces xmlns:xsi, xsi:nil, xsd etc. How can I do this?

Expected Output:

<exta>
<signature>This </signature>
<begin_date>2019-07-12T09:41:48.187</begin_date>
<ver>4</ver>
<maiden_bc>1549</maiden_bc>
<exta_id>12345</exta_id>
<nps_max_price>
    <exta_id>72723</exta_id>
    <extended_datetime>2018-11-20T11:01:29.040</extended_datetime>
    <event_ind>E</event_ind>
    <maiden>12345</maiden>
    <patient_id>123</patient_id>
    <boss_id>123LHF</boss_id>
    <template_name/>
    <end_date>2019-01-01T00:00:00</end_date>
    <UYI_AMN/>
    <dedt_bef_ATS/>
    <form>W</form>
</nps_max_price>
</exta>

CodePudding user response:

The code in the question removes namespaces from elements. But in your XML string, none of the elements are bound to a namespace. That is why nothing changes.

However, there are two namespaced attributes (xsi:nil). If you simply want to delete those attributes (or any namespaced attribute), here is how you can do it:

for elem in root.iter():      
    for attr in elem.attrib:
        if etree.QName(attr).namespace:
            del elem.attrib[attr]

etree.cleanup_namespaces(root)
  • Related