How can I open every link within my forum after filtering all of the available href tags?-CodePudding

Can it be possible to add a click on link event or a new tab for all possible links so that I can scrape my forum I had to filter the forum by using the URL as a filter by grabbing all links that contain viewthread but when I try to get it to click on it just ends with no errors can someone explain it to me as I am very new to web scraping

from selenium import webdriver
   from selenium.webdriver.chrome.options import Options
   from selenium.webdriver.chrome.service import Service
   from selenium.webdriver.common.by import By
   from selenium.webdriver.support.ui import WebDriverWait
   from selenium.webdriver.support import expected_conditions as EC

   options = Options()
   options.add_argument("start-maximized")

   webdriver_service = Service('C:\webdrivers\chromedriver.exe')
   driver = webdriver.Chrome(service=webdriver_service, options=options)
   url = "https://navalcommand.enjin.com/forum/viewforum/2989694/m/11178354/page/1"
   driver.get(url)
   wait = WebDriverWait(driver, 100)

   elems = driver.find_elements(By.XPATH, "//table[@class='structure small-cells']//a[@href]")

   for elem in elems:
       if "viewthread" in elem.get_attribute('href'):
          print(elem.get_attribute("href"))

   links = driver.find_elements(By.XPATH, "//table[@class='structure small-cells']//a[@href]")

   for link in links:
       if "veiwthread" in link.get_attribute("href"):
           wait = WebDriverWait(driver, 10)
           wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//table[@class='structure small-cells']//a[@href]']")))
           print(driver.page_source)

           link = driver.find_element(By.XPATH, ".//a[@href]")
           link.click()

CodePudding user response：

I think enjin.com is using some mechanisms that are blocking Selenium from scraping files ChromeDriver directly closes the browser, but using Firefox's Gecko Driver would show that Selenium is stuck at this page, making Selenium failing to select the elements in the latter part of your code. You might want to check on how to conceal Selenium.

CodePudding user response：

This would be my approach:

from selenium import webdriver
from selenium.webdriver.common import window
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager



options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
options.add_argument("start-maximized")
wait = WebDriverWait(driver, 100)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get("https://navalcommand.enjin.com/forum/viewforum/2989694/m/11178354/page/1")
elems = driver.find_elements(By.XPATH, "//table[@class='structure small-cells']//a[@href]")
links = []

for ele in elems:
    if "viewthread" in ele.get_attribute("href"):
        links.append(ele.get_attribute("href"))

for link in links:
    driver.switch_to.new_window(window.WindowTypes.TAB)
    driver.get(link)

Notice that elems is a list that contains selenium's WebElements and what we need is the href of them.