So I am trying to scrape data from a table from several hundred pages on a website. Here is part of what I have so far:
window_before = driver.window_handles[0]
wait = WebDriverWait(driver, 10)
driver.execute_script("window.scrollTo(0, 350)")
games = driver.find_elements(By.XPATH, '//*[@id="schedule"]/tbody/tr')
This code only works sometimes. If I run this chunk 10 times, only 5 times will the website actually scroll down. I tried using this:
for i in range(0, 2): driver.find_element(By.XPATH, '//*[@id="meta"]/div[1]/p[1]/a').send_keys(Keys.DOWN)
but the same issue arises. Sometimes that scrolls down the amount I need, other times it does nothing, and other times it scrolls the entire page.
This part of my code navigates to the first link I need to click and on the next page I need to scroll another page, where the same issue is present. This is all part of a loop that goes through several hundred pages to read html tables, so even if it works the first 50 times, I won't get all the data I need.
Edit: Directly after the above snippet I have this:
for idx, game in enumerate(games):
driver.find_element(By.XPATH, '/html/body/div[2]/div[6]/div[3]/div[2]/table/tbody/tr[' str(idx 1) ']/td[6]/a').click()
Which is where I get the "element is not clickable at point (X, Y)" error.
Am I doing something wrong here, or is there a work around to accomplish my goal?
CodePudding user response:
Here is one way to access href
attribute for every 'Box Score' link from that page (according to OP's clarification in comments):
from selenium import webdriver
from import Service
from import Options
from import By
from import WebDriverWait
from import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
chrome_options = Options()
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
actions = ActionChains(browser)
url = ''
# print(browser.page_source)
# browser.maximize_window()
wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@]'))).click()
print('clicked cookie parent')
wait.until(EC.element_to_be_clickable((By.XPATH, '//button[@mode="primary"]'))).click()
print('accepted cookies')
except Exception as e:
print('no cookies')
wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@id="all_schedule"]'))).location_once_scrolled_into_view
table_with_score_links = wait.until(EC.presence_of_element_located((By.XPATH, '//table[@id="schedule"]')))
# print(table_with_score_links.get_attribute('outerHTML'))
links_from_table = [x.get_attribute('href') for x in table_with_score_links.find_elements(By.TAG_NAME, 'a') if x.text == 'Box Score']
Result printed in terminal:
clicked cookie parent
accepted cookies
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
I tried to make variable names as descriptive as possible, and also left some commented out lines of code, to help with the thought process - build up to reach the end goal.
You can now go through those links one by one, etc.
Selenium documentation can be found here: