Home > Back-end >  Parsing XML with ETREE: finding 'xl' element properties
Parsing XML with ETREE: finding 'xl' element properties

Time:12-22

I have the (abbreviated) XML file below ( I also changed that element names a bit to obscure the application).

<?xml version="1.0" encoding="UTF-8" ?>
<Workplace Type="PP-1"
            Version="0.2"
            xmlns:xl="http://www.w3.org/1999/xlink">
    <Template xl:actuate="perFile"
                xl:href="../templates/opt/CMPRfile"
                xl:show="none"
                xl:title="CPP1"
                xl:type="verycomplicated"/>
    <ProjectID xl:actuate="withtrain"
                xl:href="filename.ppp"
                xl:show="none"
                xl:type="evenmorecomplicated"/>
/>


I want to parse the XML file with ETREE and find the values for the 'xl:' elements. How do I do that exactly. The do not seem to be attributes or text. Is this some kind of special property? I tried to find the value for 'href' for example using some code like below.

I tried to look up and figure out what the 'xl' labels are, but no luck. What is also curious is when I print the attributes for the 'Workplace' node, then I get 'Type' and 'Version', but not 'xmlns'. So, I suspect that this is somekind of special attribute? This is my first time doing serious XML parsing, so Iam probably missing something here.

I tried this:

    xml_namespace = "{http://www.w3.org/1999/xlink}"
    tree = ET.parse(project_file_name)
    xml_root_element = tree.getroot()

    projectid_element = xml_root_element.find(xml_namespace   
    "ProjectId")
    
    # Doesn't work
    value = projectid_element.text 
    value = projectid_element.attrib["href"]
    value = projectid_element.attrib["xl:href"]

    print("Value: "   value)

And I was expecting the value 'filename.ppp'

Edit 20221221_1557:

I did look at the article mentioned by FordPerfect, but I still do not seem to be able to extract the values. I have this code:


    tree, ns = parse_and_get_ns(project_file_name)
    xml_root_element = tree.getroot()
    print("xml_root_element type: "   str(type(xml_root_element)))
    print("Namespaces found: ")
    print(ns)
    elements = xml_root_element.iterfind("xl:href", ns)
    print("elements type: "   str(type(elements)))
    for ele in elements:
        print("elements ele object type: "   str(type(ele)))

and I get this as output:

xml_root_element type: <class 'xml.etree.ElementTree.Element'>
Namespaces found:
{'xl': '{http://www.w3.org/1999/xlink}'}

So you can see that the root element i am iterating over is indeed an element and that the final outcome does not contain any objects. However, I do expected at least 1 to be there.

CodePudding user response:

Edit: Forget what I wrote!

Have a look at this answer:
https://stackoverflow.com/a/14853417/10576322

CodePudding user response:

After some fiddeling I found out that these elements are keys. You can get them by using the methode keys() for an element like so:

xml_element = root.find(
        xml_namespace   "Model_Segment"
    ) # One of the elements in the XML example

    keys = xml_element.keys()
    for key in keys:
        print("key: "   key)

Output:

key: {http://www.w3.org/1999/xlink}href

key: {http://www.w3.org/1999/xlink}label

key: {http://www.w3.org/1999/xlink}role

key: {http://www.w3.org/1999/xlink}title

key: {http://www.w3.org/1999/xlink}type

I also will mark the anwser from FordPerfect as anwser, since it contains very usefull information within the context of this question.

  • Related