Home > Net >  Scraping through subsections of a web page
Scraping through subsections of a web page

Time:08-15

I want to use selenium to scrape links from a web page. The problem is that there are tabs on the page to select different subsections (Scene 1, Scene 2 etc.). The relevant links are hidden behind this subsection. The html remains the same. With my following code I only get the link for the first subsection.

download_page_links = driver.find_elements(by=By.XPATH, value="//a[@href]")
download_page_links_href = [e.get_attribute("href") for e in download_page_links]

The html code if subsection 1 is selected looks like this:

<ul > == $0
   <li>
      <a href='#' id="scene0" >1</a>
   <li>
   <li>
      <a href='#' id="scene1" class>2</a>
   <li>

The html code if subsection 2 is selected looks like this:

<ul > == $0
   <li>
      <a href='#' id="scene0" class>1</a>
   <li>
   <li>
      <a href='#' id="scene1" >2</a>
   <li>

How can I scrap through the different subsections?

CodePudding user response:

Thank you very much for your answers. I have solved the problem as follows:

    scene_list = []
    for n in range(0, 20):
        try:
            scene_list.append(driver.find_element(By.XPATH, '// *[ @ id = "scene'   str(n)   '"]'))

        except:
            pass

    for scene in scene_list:
        scene.click()
        download_page_links = driver.find_elements(by=By.XPATH, value="//a[@href]")
        download_page_links_href = [e.get_attribute("href") for e in download_page_links]

I let the script click on each scene, thereby enabling the links to appear. Then I scan the page for the appropriate links.

  • Related