Home > Blockchain >  How to remove root element from xml file using python
How to remove root element from xml file using python

Time:11-23

i have a a number of xml files with me, who's format i:

<objects>
   <object>
      <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows/>
      </record>
   </object>
</objects>

there is two root objects in this xml. i want to remove one of them using. i want the xml to look like this:

 <objects>
     <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows/>
     </record>
 </objects>

i have a folder full of this files. i want to do it using python. is there any way.

CodePudding user response:

The direct way is shown below. If your real files are more complicated than one-object/one-record you'll have to be more specific with examples:

from xml.etree import ElementTree as et

xml = '''\
<objects>
   <object>
      <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows/>
      </record>
   </object>
</objects>
'''

objects = et.fromstring(xml)
objects.append(objects[0][0]) # move "record" out of "object" and append as child to "objects"
objects.remove(objects[0])    # remove empty "object"
et.indent(objects)            # reformat indentation (Python 3.9 )
et.dump(objects)              # show result

Output:

<objects>
  <record>
    <invoice_source>EMAIL</invoice_source>
    <invoice_capture_date>2022-11-18</invoice_capture_date>
    <document_type>INVOICE</document_type>
    <data_capture_provider_code>00001</data_capture_provider_code>
    <data_capture_provider_reference>1264</data_capture_provider_reference>
    <document_capture_provide_code>00002</document_capture_provide_code>
    <document_capture_provider_ref>1264</document_capture_provider_ref>
    <rows />
  </record>
</objects>

Another option that would handle any nested content in object:

objects = et.fromstring(xml)
objects = objects[0]     # extract "object" (lose "objects" layer)
objects.tag = 'objects'  # rename "object" tag
et.indent(objects)       # reformat indentation (Python 3.9 )
et.dump(objects)         # show result (same output)

CodePudding user response:

My approach is to iterate over the children of <objects>, which is <object>, then move the <record> nodes up one level. After which, I can remove the <object> nodes.

import xml.etree.ElementTree as ET

doc = ET.parse("input.xml")
objects = doc.getroot()

for obj in objects:
    for record in obj:
        objects.append(record)
    objects.remove(obj)

doc.write("output.xml")

Here is the contents of output.xml:

<objects>
   <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows />
      </record>
   </objects>
  • Related