Let's take a look at the following xml file:
<?xml version="1.0" encoding="utf-8"?>
<root
xmlns="urn:schemas-upnp-org:device-1-0">
<specVersion>
<major>1</major>
<minor>0</minor>
</specVersion>
<URLBase>http://192.168.1.1:80</URLBase>
<device>
<serviceList>
<service>
<serviceType>1</serviceType>
</service>
</serviceList>
<deviceList>
<device>
<serviceList>
<service>
<serviceType>2</serviceType>
</service>
</serviceList>
<deviceList>
<device>
<serviceList>
<service>
<serviceType>3</serviceType>
</service>
</serviceList>
</device>
</deviceList>
</device>
</deviceList>
<presentationURL>/</presentationURL>
</device>
</root>
I want to extract all services under device
so in the example it should be only 1.
So I wrote:
import os
import sys
import xml.etree.ElementTree as ET
root = ET.fromstring(inner_xml) #inner_xml=above
device = root.find('{urn:schemas-upnp-org:device-1-0}device')
for serviceType in device.findall(
'.//{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):
print(serviceType.text)
But why I'm getting 2 and 3 too? They aren't in serviceList
directly under device
CodePudding user response:
Your code "asks" to do a recursive search by using //
for serviceType in device.findall(
'.//{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):
You need to use
for serviceType in device.findall(
'{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):
working code below
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="utf-8"?>
<root
xmlns="urn:schemas-upnp-org:device-1-0">
<specVersion>
<major>1</major>
<minor>0</minor>
</specVersion>
<URLBase>http://192.168.1.1:80</URLBase>
<device>
<serviceList>
<service>
<serviceType>1</serviceType>
</service>
</serviceList>
<deviceList>
<device>
<serviceList>
<service>
<serviceType>2</serviceType>
</service>
</serviceList>
<deviceList>
<device>
<serviceList>
<service>
<serviceType>3</serviceType>
</service>
</serviceList>
</device>
</deviceList>
</device>
</deviceList>
<presentationURL>/</presentationURL>
</device>
</root>'''
root = ET.fromstring(xml)
device = root.find('{urn:schemas-upnp-org:device-1-0}device')
for serviceType in device.findall(
'{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):
print(serviceType.text)
CodePudding user response:
You can use simple XPath ./device/serviceList/service/serviceType
to find nodes. Also you can pass namespaces as second argument to any find function to not specify them for each node in XPath expression. You can read more about this here: Parsing XML with Namespaces.
Code:
import xml.etree.ElementTree as ET
source = ...
root = ET.fromstring(source)
namespaces = {"": "urn:schemas-upnp-org:device-1-0"}
for node in root.iterfind("./device/serviceList/service/serviceType", namespaces):
print(node.text)