I need to parse XML with Selenium, but the XML is not a file, it is on the web. Here is the site https://www.thetutorsdirectory.com/usa/sitemap/sitemap_l1.xml and I need to get all the links for example this one
<url>
<loc>https://www.thetutorsdirectory.com/usa/location/private-tutor-anaheim</loc>
<changefreq>weekly</changefreq>
</url>
Please help me :)
I tried multiple solutions that were given on this site
CodePudding user response:
A solution with beautifulsoup
:
import requests
from bs4 import BeautifulSoup
url = "https://www.thetutorsdirectory.com/usa/sitemap/sitemap_l1.xml"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:107.0) Gecko/20100101 Firefox/107.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "xml")
for link in soup.select("loc"):
print(link.text)
Prints:
...
https://www.thetutorsdirectory.com/usa/location/private-tutor-wichita-falls
https://www.thetutorsdirectory.com/usa/location/private-tutor-wilmington
https://www.thetutorsdirectory.com/usa/location/private-tutor-winston-salem
https://www.thetutorsdirectory.com/usa/location/private-tutor-woodbridge
https://www.thetutorsdirectory.com/usa/location/private-tutor-worcester-usa
https://www.thetutorsdirectory.com/usa/location/private-tutor-yonkers