I'm trying to filter an XML such that only specific blocks of XML would be needed I have the original XML like this
<PROJECT>
<TASK>
<INSTALL_METHOD installer="TYPE 1" />
<FILE>
<INSTALL_OPTIONS option="signature"/>
<INSTALL_OPTIONS option="checksum"/>
</FILE>
</TASK>
<TASK>
<INSTALL_METHOD installer="TYPE 2" />
<FILE>
<INSTALL_OPTIONS option="signature"/>
<INSTALL_OPTIONS option="checksum"/>
</FILE>
</TASK>
<TASK>
<INSTALL_METHOD installer="TYPE 3" />
<FILE>
<INSTALL_OPTIONS option="signature"/>
<INSTALL_OPTIONS option="checksum"/>
</FILE>
</TASK>
<TASK>
<INSTALL_METHOD installer="TYPE 4" />
<FILE>
<INSTALL_OPTIONS option="signature"/>
<INSTALL_OPTIONS option="checksum"/>
</FILE>
</TASK>
</PROJECT>
Now I need to compare <INSTALL_METHOD installer="x" />
and move the entire TASK block to a new file, so for example, if I want only TYPE 1 and TYPE 3 the new.xml
should look something like this
<PROJECT>
<TASK>
<INSTALL_METHOD installer="TYPE 1" />
<FILE>
<INSTALL_OPTIONS option="signature"/>
<INSTALL_OPTIONS option="checksum"/>
</FILE>
</TASK>
<TASK>
<INSTALL_METHOD installer="TYPE 3" />
<FILE>
<INSTALL_OPTIONS option="signature"/>
<INSTALL_OPTIONS option="checksum"/>
</FILE>
</TASK>
</PROJECT>
I tried the below approach to locate the based on the installer type but I'm getting only the attribute, not able to get the subelements/children of this tag.
root = tree.getroot()
tasklist = root.find("TASK")
blocktype = root.findall(".//TASK/INSTALL_METHOD")
filelist = root.findall(".//TASK/FIND)
if blockType.text == "TYPE 1":
for tasks in filelist:
installer.getchildren()
tree.write("new.xml", encoding='UTF-8', xml_declaration=True)
CodePudding user response:
For each TASK
, check the value of the installer
attribute on the INSTALL_METHOD
child element. Remove the TASK
s for which the value is not "TYPE 1" or "TYPE 3".
import xml.etree.ElementTree as ET
tree = ET.parse("input.xml")
root = tree.getroot()
tasks = root.findall("TASK")
for task in tasks:
install_method = task.find("INSTALL_METHOD")
if not install_method.get("installer") in ["TYPE 1" , "TYPE 3"]:
root.remove(task)
tree.write("new.xml", encoding='UTF-8', xml_declaration=True)