Hi I have attempting to parse xml using this link: https://docs.python.org/3/library/xml.etree.elementtree.html
however, when i attempt to follow it, I am getting this issue
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('sitemap.xml')
>>> root = tree.getroot()
>>> print(root['loc'])
element indices must be integers
I am attempting to parse the loc
value from this sitemap.xml declaration:
<root>
<url>
<loc>HTTPS://website.com/</loc>
<lastmod>2022-10-10</lastmod>
</url>
<url>
<loc>https://website.com/search/</loc>
<lastmod>2022-10-10</lastmod>
</url>
<url>
<loc>https://website.com/auth/user/</loc>
</root>
UPDATE
I can get it to print out a single loc
value via:
print(root[0][0].text)
however, I want to loop through all of these loc
fields and print them out - how can i do so?
CodePudding user response:
tree = ET.parse('sitemap.xml')
root = tree.getroot()
for url in root:
for child in url:
if child.tag == 'loc':
print(child.text)
Note that this will only work if the xml is in the exact format you provided (e.g. all of the loc
tags are direct children of the url
tags).