Home > Software engineering >  Selenium scrolling randomly
Selenium scrolling randomly

Time:10-15

So I am trying to scrape data from a table from several hundred pages on a website. Here is part of what I have so far:

driver.get("link")
driver.maximize_window()
window_before = driver.window_handles[0]
driver.switch_to.window(window_before)
wait = WebDriverWait(driver, 10)
driver.execute_script("window.scrollTo(0, 350)")
games = driver.find_elements(By.XPATH, '//*[@id="schedule"]/tbody/tr')

This code only works sometimes. If I run this chunk 10 times, only 5 times will the website actually scroll down. I tried using this:

for i in range(0, 2): driver.find_element(By.XPATH, '//*[@id="meta"]/div[1]/p[1]/a').send_keys(Keys.DOWN)

but the same issue arises. Sometimes that scrolls down the amount I need, other times it does nothing, and other times it scrolls the entire page.

This part of my code navigates to the first link I need to click and on the next page I need to scroll another page, where the same issue is present. This is all part of a loop that goes through several hundred pages to read html tables, so even if it works the first 50 times, I won't get all the data I need.

Edit: Directly after the above snippet I have this:

for idx, game in enumerate(games):

driver.find_element(By.XPATH, '/html/body/div[2]/div[6]/div[3]/div[2]/table/tbody/tr[' str(idx 1) ']/td[6]/a').click()

Which is where I get the "element is not clickable at point (X, Y)" error.

Am I doing something wrong here, or is there a work around to accomplish my goal?

CodePudding user response:

Here is one way to access href attribute for every 'Box Score' link from that page (according to OP's clarification in comments):

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
actions = ActionChains(browser)
url = 'https://www.basketball-reference.com/leagues/NBA_2014_games-october.html'
browser.get(url)

# print(browser.page_source)
# browser.maximize_window()
try:
    wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@]'))).click()
    print('clicked cookie parent')
    wait.until(EC.element_to_be_clickable((By.XPATH, '//button[@mode="primary"]'))).click()
    print('accepted cookies')
except Exception as e:
    print('no cookies')
wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@id="all_schedule"]'))).location_once_scrolled_into_view
table_with_score_links = wait.until(EC.presence_of_element_located((By.XPATH, '//table[@id="schedule"]')))
# print(table_with_score_links.get_attribute('outerHTML'))
links_from_table = [x.get_attribute('href') for x in table_with_score_links.find_elements(By.TAG_NAME, 'a') if x.text == 'Box Score']
print(links_from_table)

Result printed in terminal:

clicked cookie parent
accepted cookies
['https://www.basketball-reference.com/boxscores/201310290IND.html', 'https://www.basketball-reference.com/boxscores/201310290MIA.html', 'https://www.basketball-reference.com/boxscores/201310290LAL.html', 'https://www.basketball-reference.com/boxscores/201310300CLE.html', 'https://www.basketball-reference.com/boxscores/201310300TOR.html', 'https://www.basketball-reference.com/boxscores/201310300PHI.html', 'https://www.basketball-reference.com/boxscores/201310300DET.html', 'https://www.basketball-reference.com/boxscores/201310300NYK.html', 'https://www.basketball-reference.com/boxscores/201310300NOP.html', 'https://www.basketball-reference.com/boxscores/201310300MIN.html', 'https://www.basketball-reference.com/boxscores/201310300HOU.html', 'https://www.basketball-reference.com/boxscores/201310300SAS.html', 'https://www.basketball-reference.com/boxscores/201310300DAL.html', 'https://www.basketball-reference.com/boxscores/201310300UTA.html', 'https://www.basketball-reference.com/boxscores/201310300PHO.html', 'https://www.basketball-reference.com/boxscores/201310300SAC.html', 'https://www.basketball-reference.com/boxscores/201310300GSW.html', 'https://www.basketball-reference.com/boxscores/201310310CHI.html', 'https://www.basketball-reference.com/boxscores/201310310LAC.html']

I tried to make variable names as descriptive as possible, and also left some commented out lines of code, to help with the thought process - build up to reach the end goal.

You can now go through those links one by one, etc.

Selenium documentation can be found here: https://www.selenium.dev/documentation/

  • Related