I am new to Python & trying to extract XML attributes. Below is the code that I tried.
import xml.etree.ElementTree as ET
a = '''<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
<countryCode>RO</countryCode>
<vatNumber>43097749</vatNumber>
<requestDate>2022-07-12 02:00</requestDate>
<valid>true</valid>
<name>ROHLIG SUUS LOGISTICS ROMANIA S.R.L.</name>
<address>MUNICIPIUL BUCUREŞTI, SECTOR 1
BLD. ION MIHALACHE Nr. 15-17
Et. 1</address>
</checkVatResponse>
</soap:Body>
</soap:Envelope>'''
tree = ET.ElementTree(ET.fromstring(a))
root = tree.getroot()
for cust in root.findall('Body/checkVatResponse'):
name = cust.find('name').text
print(name)
I wanted to extract 'name' and 'address' from XML. But when I run the above code nothing is printed. What is my mistake?
Regards, Mayank Pande
CodePudding user response:
Namespaces dawg, namespaces! You can be damn sure that when Jay-Z rapped about having 99 problems, having to deal with XML with namespaces was definitely one of them!
See Parsing XML with Namespaces
For the body
tag, its namespace is http://schemas.xmlsoap.org/soap/envelope/
, checkVatResponse
's is urn:ec.europa.eu:taxud:vies:services:checkVat:types
, and both name
and address
's are urn:ec.europa.eu:taxud:vies:services:checkVat:types
, which they inherit off their parent, checkVatResponse
.
So, you can explicitly search for an element including its namespace, like so:
root.findall('{http://schemas.xmlsoap.org/soap/envelope/}Body/{urn:ec.europa.eu:taxud:vies:services:checkVat:types}checkVatResponse')
Or you can ignore it with the wildcard character:
root.findall('{*}Body/{*}checkVatResponse')
Try this:
a = '''<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
<countryCode>RO</countryCode>
<vatNumber>43097749</vatNumber>
<requestDate>2022-07-12 02:00</requestDate>
<valid>true</valid>
<name>ROHLIG SUUS LOGISTICS ROMANIA S.R.L.</name>
<address>MUNICIPIUL BUCUREŞTI, SECTOR 1
BLD. ION MIHALACHE Nr. 15-17
Et. 1</address>
</checkVatResponse>
</soap:Body>
</soap:Envelope>'''
tree = ET.ElementTree(ET.fromstring(a))
root = tree.getroot()
for cust in root.findall('{*}Body/{*}checkVatResponse'):
name = cust.find('{*}name').text
print(name)
address = cust.find('{*}address').text
print(address)
Output:
ROHLIG SUUS LOGISTICS ROMANIA S.R.L.
MUNICIPIUL BUCUREŞTI, SECTOR 1
BLD. ION MIHALACHE Nr. 15-17
Et. 1