Home > database >  Why .findall in xml file doesn't read correctly?
Why .findall in xml file doesn't read correctly?

Time:11-03

Let's take a look at the following xml file:

<?xml version="1.0" encoding="utf-8"?>
<root
    xmlns="urn:schemas-upnp-org:device-1-0">
    <specVersion>
        <major>1</major>
        <minor>0</minor>
    </specVersion>
    <URLBase>http://192.168.1.1:80</URLBase>
    <device>
        <serviceList>
            <service>
                <serviceType>1</serviceType>
            </service>
        </serviceList>
        <deviceList>
            <device>
                <serviceList>
                    <service>
                        <serviceType>2</serviceType>
                    </service>
                </serviceList>
                <deviceList>
                    <device>
                        <serviceList>
                            <service>
                                <serviceType>3</serviceType>
                            </service>
                        </serviceList>
                    </device>
                </deviceList>
            </device>
        </deviceList>
        <presentationURL>/</presentationURL>
    </device>
</root>

I want to extract all services under device so in the example it should be only 1.

So I wrote:

import os
import sys
import xml.etree.ElementTree as ET

root = ET.fromstring(inner_xml) #inner_xml=above
device = root.find('{urn:schemas-upnp-org:device-1-0}device')
for serviceType in device.findall(
        './/{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):
    print(serviceType.text)

But why I'm getting 2 and 3 too? They aren't in serviceList directly under device

CodePudding user response:

Your code "asks" to do a recursive search by using //

for serviceType in device.findall(
        './/{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):

You need to use

for serviceType in device.findall(
        '{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):

working code below

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0" encoding="utf-8"?>
<root
    xmlns="urn:schemas-upnp-org:device-1-0">
    <specVersion>
        <major>1</major>
        <minor>0</minor>
    </specVersion>
    <URLBase>http://192.168.1.1:80</URLBase>
    <device>
        <serviceList>
            <service>
                <serviceType>1</serviceType>
            </service>
        </serviceList>
        <deviceList>
            <device>
                <serviceList>
                    <service>
                        <serviceType>2</serviceType>
                    </service>
                </serviceList>
                <deviceList>
                    <device>
                        <serviceList>
                            <service>
                                <serviceType>3</serviceType>
                            </service>
                        </serviceList>
                    </device>
                </deviceList>
            </device>
        </deviceList>
        <presentationURL>/</presentationURL>
    </device>
</root>'''

root = ET.fromstring(xml)
device = root.find('{urn:schemas-upnp-org:device-1-0}device')
for serviceType in device.findall(
        '{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):
    print(serviceType.text)

CodePudding user response:

You can use simple XPath ./device/serviceList/service/serviceType to find nodes. Also you can pass namespaces as second argument to any find function to not specify them for each node in XPath expression. You can read more about this here: Parsing XML with Namespaces.

Code:

import xml.etree.ElementTree as ET

source = ...
root = ET.fromstring(source)

namespaces = {"": "urn:schemas-upnp-org:device-1-0"}
for node in root.iterfind("./device/serviceList/service/serviceType", namespaces):
    print(node.text)
  • Related