I'm using xml.etree.ElementTree to extract xml file as a new xml file. I'm parsing xml file in the following way:
import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()
This is my xml file:
<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
</Infos>
</XML_Head>
How can I extract the vdid I want or delete the vdid I don’t want? For example, I want to keep the group with vdid=T74, and the expected XML output is as follows:
<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
</Infos>
</XML_Head>
Thank you!
CodePudding user response:
You can store the vdid
(s) you want to keep in a set and then go through your xml file and remove the unwanted ones:
import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()
vdid_to_keep = {"T74"}
infos = root.find("Infos")
for info_tag in infos.findall('Info'):
if info_tag.get("vdid") not in vdid_to_keep:
infos.remove(info_tag)
tree.write("./output.xml")
output.xml:
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
</Infos>
</XML_Head>
CodePudding user response:
Use xpath
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
</Infos>
</XML_Head>'''
root = ET.fromstring(xml)
info_sub_list = root.findall('.//Info[@vdid="T74"]')
infos = root.find('.//Infos')
infos.clear()
infos.extend(info_sub_list)
ET.dump(root)
output
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos><Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
</Infos></XML_Head>