Home > Net >  parse xml with specific declaration
parse xml with specific declaration

Time:10-12

Hi I have attempting to parse xml using this link: https://docs.python.org/3/library/xml.etree.elementtree.html

however, when i attempt to follow it, I am getting this issue

>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('sitemap.xml')
>>> root = tree.getroot()
>>> print(root['loc'])
element indices must be integers

I am attempting to parse the loc value from this sitemap.xml declaration:

<root>
  <url>
    <loc>HTTPS://website.com/</loc>
    <lastmod>2022-10-10</lastmod>
  </url>
  <url>
    <loc>https://website.com/search/</loc>
    <lastmod>2022-10-10</lastmod>
  </url>
  <url>
    <loc>https://website.com/auth/user/</loc>
</root>

UPDATE I can get it to print out a single loc value via: print(root[0][0].text)

however, I want to loop through all of these loc fields and print them out - how can i do so?

CodePudding user response:

tree = ET.parse('sitemap.xml')
root = tree.getroot()
for url in root:
    for child in url:
        if child.tag == 'loc':
            print(child.text)

Note that this will only work if the xml is in the exact format you provided (e.g. all of the loc tags are direct children of the url tags).

  • Related