Home > Blockchain >  How to use ElementTree in python to extract fields that meet specified conditions for XML
How to use ElementTree in python to extract fields that meet specified conditions for XML

Time:11-30

I'm using xml.etree.ElementTree to extract xml file as a new xml file. I'm parsing xml file in the following way:

import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()

This is my xml file:

<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos>
        <Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
    </Infos>
</XML_Head>

How can I extract the vdid I want or delete the vdid I don’t want? For example, I want to keep the group with vdid=T74, and the expected XML output is as follows:

<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
    </Infos>
</XML_Head>

Thank you!

CodePudding user response:

You can store the vdid(s) you want to keep in a set and then go through your xml file and remove the unwanted ones:

import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()

vdid_to_keep = {"T74"}

infos = root.find("Infos")
for info_tag in infos.findall('Info'):
    if info_tag.get("vdid") not in vdid_to_keep:
        infos.remove(info_tag)

tree.write("./output.xml")

output.xml:

<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
        </Info>
    </Infos>
</XML_Head>

CodePudding user response:

Use xpath

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos>
        <Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
    </Infos>
</XML_Head>'''

root = ET.fromstring(xml)
info_sub_list = root.findall('.//Info[@vdid="T74"]')

infos = root.find('.//Infos')
infos.clear()
infos.extend(info_sub_list)
ET.dump(root)

output

<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos><Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
        </Info>
    </Infos></XML_Head>
  • Related