Home > Mobile >  Scrape table data with pagination
Scrape table data with pagination

Time:11-17

Trying to scrape the table with Selenium where have pagination. Website which trying to scrape don't have pagination in URL.

table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'

home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')

teams = []

page = 0
while page < 10:
    page =1
    time.sleep(5)
    for i in range(len(home)):
        temp_data = home[i].text   '\n'   away[i].text
        pair = teams.append(temp_data)

    next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()

teams = [] store only data from the first page. When the script move to another page, get this error

Traceback (most recent call last):
  File "C:\Users\XXX\OneDrive\Documents\A\b\s_pc.py", line 49, in <module>
    temp_data = home[i].text   '\n'   away[i].text
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 76, in text
    return self._execute(Command.GET_ELEMENT_TEXT)['value']
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
    return self._parent.execute(command, params)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 418, in execute
    self.error_handler.check_response(response)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=96.0.4664.45)
Stacktrace:

CodePudding user response:

Have defined the home and away elements inside the while loop. And also shifted the time.sleep() at the beginning of the while loop. And the code didnt throw any error.

Check if this is working as expected.

table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'

teams = []

page = 0
while page < 10:
    time.sleep(5)
    home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
    away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')
    page =1

    for i in range(len(home)):
        temp_data = home[i].text   '\n'   away[i].text
        pair = teams.append(temp_data)

    next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()
  • Related