I want to use selenium to scrape links from a web page. The problem is that there are tabs on the page to select different subsections (Scene 1, Scene 2 etc.). The relevant links are hidden behind this subsection. The html remains the same. With my following code I only get the link for the first subsection.
download_page_links = driver.find_elements(by=By.XPATH, value="//a[@href]")
download_page_links_href = [e.get_attribute("href") for e in download_page_links]
The html code if subsection 1 is selected looks like this:
<ul > == $0
<li>
<a href='#' id="scene0" >1</a>
<li>
<li>
<a href='#' id="scene1" class>2</a>
<li>
The html code if subsection 2 is selected looks like this:
<ul > == $0
<li>
<a href='#' id="scene0" class>1</a>
<li>
<li>
<a href='#' id="scene1" >2</a>
<li>
How can I scrap through the different subsections?
CodePudding user response:
Thank you very much for your answers. I have solved the problem as follows:
scene_list = []
for n in range(0, 20):
try:
scene_list.append(driver.find_element(By.XPATH, '// *[ @ id = "scene' str(n) '"]'))
except:
pass
for scene in scene_list:
scene.click()
download_page_links = driver.find_elements(by=By.XPATH, value="//a[@href]")
download_page_links_href = [e.get_attribute("href") for e in download_page_links]
I let the script click on each scene, thereby enabling the links to appear. Then I scan the page for the appropriate links.