Home > OS >  Element randomly not found while using Selenium
Element randomly not found while using Selenium

Time:08-31

I am running the script below to review the latest filings by public companies listed on the London Stock Exchange. I have encountered a weird bug. It works 3/4 of the times, however, It randomly fails to locate the elements that I am looking for. I am using WebDriverWait in each attempt to wait until element is loaded, locate any element. So I am not sure what else I can do to ensure that these elements are found. Any idea, How can I ensure 100% success rate in locating these elements?

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service

chrome_options = Options()
chrome_options.add_argument("--window-size=1920,1080")
chrome_options.add_argument("--headless")
s = Service(ChromeDriverManager().install())

driver = webdriver.Chrome(service=s, options=chrome_options)

# click the cookies disclaimer
URL = f"https://www.londonstockexchange.com/news?tab=news-explorer&period=lastmonth&page=1"
driver.get(URL)
WebDriverWait(driver, 5).until(
    EC.visibility_of_element_located((By.XPATH, "//button[@id='ccc-notify-accept']"))
).click()

while True:
    try:
        URL = f"https://www.londonstockexchange.com/news?tab=news-explorer&period=lastmonth&page=1"
        driver.get(URL)
        # find the form dropdown parent and click it
        dropdown_path = '//*[@id="news-table-results"]/div[1]/form[1]'
        WebDriverWait(driver, 5).until(
            EC.visibility_of_element_located((By.XPATH, dropdown_path))
        ).click()
        # search the dropdown for the largest viewing option (500 items) and select it
        WebDriverWait(driver, 5).until(
            EC.visibility_of_element_located(
                (By.XPATH, "//*[contains(text(), 'Show 500 news')]")
            )
        ).find_element(By.XPATH, "..").click()
        # wait for the filing table to appear
        WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.TAG_NAME, "table")))
        print("Success")
    except:
        print("Failure")
        raise

The elements that are periodically not found are usually:

//*[@id="news-table-results"]/div[1]/form[1]
//*[contains(text(), 'Show 500 news')]

CodePudding user response:

I see you're f-stringing a .. static url? Not sure why. Pulling 500 news takes more than 3 seconds you're waiting for that table to load. So either wait more, or just scrape the 20 news pages, until you're getting all 500 (that would be 25 pages).

Anyhow, the following code will pull all those news. I am breaking the while loop after the 5th page, but you can remove that condition if you want, or increase the number of pages, etc:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
big_df = pd.DataFrame()
counter = 1
while True:
    url = f'https://www.londonstockexchange.com/news?tab=news-explorer&period=lastmonth&page={counter}'
    browser.get(url)
    try:
        WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.ID, 'ccc-notify-accept'))).click()
    except Exception as e:
        pass ## no cookies, so this pass is acceptable in this instance
    try:
        t.sleep(2)
        WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.ID, 'dropdownSize'))).click()
#         print('clicked the dropdown')
        t.sleep(10)
        WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, '//span[text() = " Show 500 news "]'))).click()
#         print('selected 500 news')
        t.sleep(10)
    except Exception as e:
        print(e)
    try:
        table = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,'.full-width.table')))
        df = pd.read_html(table.get_attribute('outerHTML'))[0]
        big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
        counter = counter   1
        if counter > 5:
            break
    except Exception as e:
        print(e)
        break
display(big_df)

The result is a 2.5K rows dataframe:

    Headline    Source  Date    Time    Price   Change %
0   Bushveld Minerals Limited - BMN - Total Voting Rights RNS 31 August 2022 08:35:04 5.90 -    RNS 31.08.22    08:35:04    5.90    -
1   Sampo PLC - 95BM - Tender Offer RNS 31 August 2022 08:34:57 - - RNS 31.08.22    08:34:57    -   -
2   Oil and Gas Development Company Ltd - OGDC - Material Information RNS 31 August 2022 08:20:50 5.00 -    RNS 31.08.22    08:20:50    5.00    -
3   Petra Diamonds Limited - PDL - Notification of FY22 Prelim Results Release PRN 31 August 2022 08:09:00 97.00 -  PRN 31.08.22    08:09:00    97.00   -
4   F&C Investment Trust PLC - FCIT - Transaction in Own Shares RNS 31 August 2022 08:08:02 895.00 -0.33%   RNS 31.08.22    08:08:02    895.00  -0.33%
... ... ... ... ... ... ...
2495    RM PLC - RM. - RM plc: Holiding(s) in Company EQS 30 August 2022 08:14:22 46.50 -   EQS 30.08.22    08:14:22    46.50   -
2496    NatWest Group plc - NWG - Share consolidation and total voting rights RNS 30 August 2022 08:10:44 249.05 0.02%  RNS 30.08.22    08:10:44    249.05  0.02%
2497    Harvest Minerals Limited - HMI - Result of AGM RNS 30 August 2022 08:06:30 11.40 -1.30% RNS 30.08.22    08:06:30    11.40   -1.30%
2498    Speedy Hire PLC - SDY - Director Declaration RNS 30 August 2022 08:00:04 41.00 -    RNS 30.08.22    08:00:04    41.00   -
2499    X5 Retail Group N.V. - FIVE - X5 upgrades Paket by X5 subscription service RNS 30 August 2022 08:00:04 0.53 -   RNS 30.08.22    08:00:04    0.53    -
2500 rows × 6 columns
  • Related