Trying to scrape the table with Selenium where have pagination. Website which trying to scrape don't have pagination in URL.
table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'
home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')
teams = []
page = 0
while page < 10:
page =1
time.sleep(5)
for i in range(len(home)):
temp_data = home[i].text '\n' away[i].text
pair = teams.append(temp_data)
next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()
teams = []
store only data from the first page. When the script move to another page, get this error
Traceback (most recent call last):
File "C:\Users\XXX\OneDrive\Documents\A\b\s_pc.py", line 49, in <module>
temp_data = home[i].text '\n' away[i].text
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 76, in text
return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
return self._parent.execute(command, params)
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 418, in execute
self.error_handler.check_response(response)
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=96.0.4664.45)
Stacktrace:
CodePudding user response:
Have defined the home
and away
elements inside the while loop
. And also shifted the time.sleep()
at the beginning of the while loop. And the code didnt throw any error.
Check if this is working as expected.
table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'
teams = []
page = 0
while page < 10:
time.sleep(5)
home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')
page =1
for i in range(len(home)):
temp_data = home[i].text '\n' away[i].text
pair = teams.append(temp_data)
next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()