I'm trying to scrape specific links on a website. I'm using Python and Selenium 4.8. The HTML code looks like this with multiple lists, each containing a link:
<li>
<div >
<div >
<h4 >
<a href="https://www.example_link1.com">
</a>
</h4>
</div>
</div>
</li>
<li>...</li>
<li>...</li>
So each < li > contains a link.
Ideally, I would like a python list with all the hrefs which I can then iterate through to get additional output.
Thank you for your help!
CodePudding user response:
You can try something like below (untested, as you didn't confirm the url):
[...]
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
[...]
wait = WebDriverWait(driver, 25)
[...]
wanted_elements = [x.get_attribute('href') for x in wait.until(EC.presence_of_all_elements_located((By.XPATH, '//li//h4[@]/a[@]')))]
Selenium documentation can be found here.
CodePudding user response:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://www.example.com")
lis = driver.find_elements_by_xpath('//li//a[@]')
hrefs = []
for li in lis:
hrefs.append(li.get_attribute('href'))
driver.quit()
This will give you a list hrefs with all the hrefs from the website. You can then iterate through this list and use the links for further processing.