Home > Software engineering >  Selenium Python XML parsing
Selenium Python XML parsing

Time:12-02

I need to parse XML with Selenium, but the XML is not a file, it is on the web. Here is the site https://www.thetutorsdirectory.com/usa/sitemap/sitemap_l1.xml and I need to get all the links for example this one

<url>
<loc>https://www.thetutorsdirectory.com/usa/location/private-tutor-anaheim</loc>
<changefreq>weekly</changefreq>
</url>

Please help me :)

I tried multiple solutions that were given on this site

CodePudding user response:

A solution with beautifulsoup:

import requests
from bs4 import BeautifulSoup

url = "https://www.thetutorsdirectory.com/usa/sitemap/sitemap_l1.xml"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:107.0) Gecko/20100101 Firefox/107.0"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "xml")

for link in soup.select("loc"):
    print(link.text)

Prints:


...

https://www.thetutorsdirectory.com/usa/location/private-tutor-wichita-falls
https://www.thetutorsdirectory.com/usa/location/private-tutor-wilmington
https://www.thetutorsdirectory.com/usa/location/private-tutor-winston-salem
https://www.thetutorsdirectory.com/usa/location/private-tutor-woodbridge
https://www.thetutorsdirectory.com/usa/location/private-tutor-worcester-usa
https://www.thetutorsdirectory.com/usa/location/private-tutor-yonkers
  • Related