Home > Software design >  How to move only specific blocks of XML to a new XML file?
How to move only specific blocks of XML to a new XML file?

Time:08-03

I'm trying to filter an XML such that only specific blocks of XML would be needed I have the original XML like this

<PROJECT>
<TASK>
    <INSTALL_METHOD installer="TYPE 1" />
    <FILE>
        <INSTALL_OPTIONS option="signature"/>
        <INSTALL_OPTIONS option="checksum"/>
    </FILE>
</TASK>
<TASK>
    <INSTALL_METHOD installer="TYPE 2" />
    <FILE>
        <INSTALL_OPTIONS option="signature"/>
        <INSTALL_OPTIONS option="checksum"/>
    </FILE>
</TASK>
<TASK>
    <INSTALL_METHOD installer="TYPE 3" />
    <FILE>
        <INSTALL_OPTIONS option="signature"/>
        <INSTALL_OPTIONS option="checksum"/>
    </FILE>
</TASK>
<TASK>
    <INSTALL_METHOD installer="TYPE 4" />
    <FILE>
        <INSTALL_OPTIONS option="signature"/>
        <INSTALL_OPTIONS option="checksum"/>
    </FILE>
</TASK>
</PROJECT>

Now I need to compare <INSTALL_METHOD installer="x" /> and move the entire TASK block to a new file, so for example, if I want only TYPE 1 and TYPE 3 the new.xml should look something like this

<PROJECT>
<TASK>
    <INSTALL_METHOD installer="TYPE 1" />
    <FILE>
        <INSTALL_OPTIONS option="signature"/>
        <INSTALL_OPTIONS option="checksum"/>
    </FILE>
</TASK>
<TASK>
    <INSTALL_METHOD installer="TYPE 3" />
    <FILE>
        <INSTALL_OPTIONS option="signature"/>
        <INSTALL_OPTIONS option="checksum"/>
    </FILE>
</TASK>
</PROJECT>

I tried the below approach to locate the based on the installer type but I'm getting only the attribute, not able to get the subelements/children of this tag.

root = tree.getroot()
tasklist = root.find("TASK")
blocktype = root.findall(".//TASK/INSTALL_METHOD")
filelist = root.findall(".//TASK/FIND)
if blockType.text == "TYPE 1":
    for tasks in filelist:
         installer.getchildren()
tree.write("new.xml", encoding='UTF-8', xml_declaration=True)

CodePudding user response:

For each TASK, check the value of the installer attribute on the INSTALL_METHOD child element. Remove the TASKs for which the value is not "TYPE 1" or "TYPE 3".

import xml.etree.ElementTree as ET

tree = ET.parse("input.xml")
root = tree.getroot()
tasks = root.findall("TASK")
   
for task in tasks:
    install_method = task.find("INSTALL_METHOD")
    if not install_method.get("installer") in ["TYPE 1" , "TYPE 3"]:
        root.remove(task)

tree.write("new.xml", encoding='UTF-8', xml_declaration=True)
  • Related