Home > OS >  Parse XML file in python and retrieve nested children
Parse XML file in python and retrieve nested children

Time:06-24

This post has everything of a duplicate.. and yet I can not figure it out despite the dozen of posts I've read.

Here is the XML (a short version):

<?xml version="1.0" encoding="iso-8859-1" standalone="yes" ?>
<coordinates xmlns="http://www.egi.com/coordinates_mff" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <sensorLayout>
        <name>HydroCel GSN 256 1.0</name>
        <sensors>
            <sensor>
                <name></name>
                <number>1</number>
                <type>0</type>
                <x>6.962</x>
                <y>5.382</y>
                <z>-2.191</z>
            </sensor>
            <sensor>
                <name></name>
                <number>2</number>
                <type>0</type>
                <x>6.484</x>
                <y>6.404</y>
                <z>-0.140</z>
            </sensor>
            <sensor>
                <name></name>
                <number>3</number>
                <type>0</type>
                <x>5.699</x>
                <y>7.208</y>
                <z>1.791</z>
            </sensor>
        </sensors>
    </sensorLayout>
    <acqTime>2006-04-28T15:32:00.000000-08:00</acqTime>
    <acqMethod>An Average of Many Data Sets&#x09;&#x09;&#x09;&#x09;</acqMethod>
    <defaultSubject>true</defaultSubject>
</coordinates>

How can I retrieve a list of sensors with the name, number, and coordinates for each sensor?

I am struggling to iterate over this tree:

tree = ET.parse(xml_fname)
root = tree.getroot()

# And then what?
# root.iter("sensor")  does not yield any element
# root.find("sensor")  returns None
# and so on...

The tag of the root looks weird to me..

root.tag
'{http://www.egi.com/coordinates_mff}coordinates'

Thanks for the help!

CodePudding user response:

You can just parse it with namespace support. See below.

from xml.etree import ElementTree as ET
root = ET.parse(xml_fname)
# Use an XPath to get all sensor tags
sensors = root.findall(".//sensorLayout/sensors/sensor", namespaces={"": "http://www.egi.com/coordinates_mff"})
# Burst values from each child of every sensor
sensor_values = [{ct.tag.split("}")[-1]: ct.text for ct in sensor.getchildren()} for sensor in sensors]
# Dict key is formed by removing the namspace part in the tag - dirty!
print(sensor_values)

You get something lik e

[
{'name': None, 'number': '1', 'type': '0', 'x': '6.962', 'y': '5.382', 'z': '-2.191'}, 
{'name': None, 'number': '2', 'type': '0', 'x': '6.484', 'y': '6.404', 'z': '-0.140'}, 
{'name': None, 'number': '3', 'type': '0', 'x': '5.699', 'y': '7.208', 'z': '1.791'}
]
  • Related