I have the (abbreviated) XML file below ( I also changed that element names a bit to obscure the application).
<?xml version="1.0" encoding="UTF-8" ?>
<Workplace Type="PP-1"
Version="0.2"
xmlns:xl="http://www.w3.org/1999/xlink">
<Template xl:actuate="perFile"
xl:href="../templates/opt/CMPRfile"
xl:show="none"
xl:title="CPP1"
xl:type="verycomplicated"/>
<ProjectID xl:actuate="withtrain"
xl:href="filename.ppp"
xl:show="none"
xl:type="evenmorecomplicated"/>
/>
I want to parse the XML file with ETREE and find the values for the 'xl:' elements. How do I do that exactly. The do not seem to be attributes or text. Is this some kind of special property? I tried to find the value for 'href' for example using some code like below.
I tried to look up and figure out what the 'xl' labels are, but no luck. What is also curious is when I print the attributes for the 'Workplace' node, then I get 'Type' and 'Version', but not 'xmlns'. So, I suspect that this is somekind of special attribute? This is my first time doing serious XML parsing, so Iam probably missing something here.
I tried this:
xml_namespace = "{http://www.w3.org/1999/xlink}"
tree = ET.parse(project_file_name)
xml_root_element = tree.getroot()
projectid_element = xml_root_element.find(xml_namespace
"ProjectId")
# Doesn't work
value = projectid_element.text
value = projectid_element.attrib["href"]
value = projectid_element.attrib["xl:href"]
print("Value: " value)
And I was expecting the value 'filename.ppp'
Edit 20221221_1557:
I did look at the article mentioned by FordPerfect, but I still do not seem to be able to extract the values. I have this code:
tree, ns = parse_and_get_ns(project_file_name)
xml_root_element = tree.getroot()
print("xml_root_element type: " str(type(xml_root_element)))
print("Namespaces found: ")
print(ns)
elements = xml_root_element.iterfind("xl:href", ns)
print("elements type: " str(type(elements)))
for ele in elements:
print("elements ele object type: " str(type(ele)))
and I get this as output:
xml_root_element type: <class 'xml.etree.ElementTree.Element'>
Namespaces found:
{'xl': '{http://www.w3.org/1999/xlink}'}
So you can see that the root element i am iterating over is indeed an element and that the final outcome does not contain any objects. However, I do expected at least 1 to be there.
CodePudding user response:
Edit: Forget what I wrote!
Have a look at this answer:
https://stackoverflow.com/a/14853417/10576322
CodePudding user response:
After some fiddeling I found out that these elements are keys. You can get them by using the methode keys() for an element like so:
xml_element = root.find(
xml_namespace "Model_Segment"
) # One of the elements in the XML example
keys = xml_element.keys()
for key in keys:
print("key: " key)
Output:
key: {http://www.w3.org/1999/xlink}href
key: {http://www.w3.org/1999/xlink}label
key: {http://www.w3.org/1999/xlink}role
key: {http://www.w3.org/1999/xlink}title
key: {http://www.w3.org/1999/xlink}type
I also will mark the anwser from FordPerfect as anwser, since it contains very usefull information within the context of this question.