xml parsing in python with XPath-CodePudding

I am trying to parse an XML file in Python with the built in xml module and Elemnt tree, but what ever I try to do according to the documentation, it does not give me what I need. I am trying to extract all the value tags into a list

<?xml version="1.0" encoding="UTF-8"?>
<CustomField xmlns="http://soap.sforce.com/2006/04/metadata">
    <fullName>testPicklist__c</fullName>
    <externalId>false</externalId>
    <label>testPicklist</label>
    <required>false</required>
    <trackFeedHistory>false</trackFeedHistory>
    <type>Picklist</type>
    <valueSet>
        <restricted>true</restricted>
        <valueSetDefinition>
            <sorted>false</sorted>
            <value>
                <fullName>a 32</fullName>
                <default>false</default>
                <label>a 32</label>
            </value>
            <value>
                <fullName>23 432;:</fullName>
                <default>false</default>
                <label>23 432;:</label>
            </value>

and here is the example code that I cant get to work. It's very basic and all I have issues is the xpath.

from xml.etree.ElementTree import ElementTree

field_filepath= "./testPicklist__c.field-meta.xml"

mydoc = ElementTree()
mydoc.parse(field_filepath)
root = mydoc.getroot()

print(root.findall(".//value")
print(root.findall(".//*/value")
print(root.findall("./*/value")

CodePudding user response：

Since the root element has attribute xmlns="http://soap.sforce.com/2006/04/metadata", every element in the document will belong to this namespace. So you're actually looking for {http://soap.sforce.com/2006/04/metadata}value elements.

To search all <value> elements in this document you have to specify the namespace argument in the findall() function

from xml.etree.ElementTree import ElementTree

field_filepath= "./testPicklist__c.field-meta.xml"

mydoc = ElementTree()
mydoc.parse(field_filepath)
root = mydoc.getroot()

# get the namespace of root
ns = root.tag.split('}')[0][1:]

# create a dictionary with the namespace
ns_d = {'my_ns': ns}

# get all the values
values = root.findall('.//my_ns:value', namespaces=ns_d)

# print the values
for value in values:
    print(value)

Outputs:

<Element '{http://soap.sforce.com/2006/04/metadata}value' at 0x7fceea043ba0>
<Element '{http://soap.sforce.com/2006/04/metadata}value' at 0x7fceea043e20>

Alternatively you can just search for the {http://soap.sforce.com/2006/04/metadata}value

# get all the values
values = root.findall('.//{http://soap.sforce.com/2006/04/metadata}value')