Home > Mobile >  Selenium (Python) Webdriver can't access a website
Selenium (Python) Webdriver can't access a website

Time:09-13

I want to access a website and fetch information using Selenium. I have passed the web URL using a list (vro_list) and fetched information from each URL in a list (roe_list). Website links I'm accessing: enter image description here

It seems like the website is blocking the access of the Selenium web driver. You can also see 'Brave is being controlled by automated software.' This happened on the second iteration of the for loop. The first iteration ran fine and I got the desired result.

How can I bypass this to fetch the required information? Please help

I'm also sharing the error message received on the console of Jupyter -

NoSuchElementException                    Traceback (most recent call last)
Input In [29], in <cell line: 8>()
      8 for url_link in vro_list:
      9     print(url_link)
---> 10     roe_item = fetch_roe(url_link)
     11     roe_list.append(roe_item)
     12     time.sleep(5)

Input In [27], in fetch_roe(link)
      6 time.sleep(5)
      8 #name = browser.find_element('xpath', '/html/body/div[3]/h1/span')
----> 9 roe = browser.find_element('xpath', '/html/body/section[2]/div/div/div[1]/div/div[4]/section[1]/div[1]/div/div[2]/div[2]/div/div/div[2]/table/tbody/tr[2]/td[2]/div')
     11 return roe.text

File ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py:855, in WebDriver.find_element(self, by, value)
    852     by = By.CSS_SELECTOR
    853     value = '[name="%s"]' % value
--> 855 return self.execute(Command.FIND_ELEMENT, {
    856     'using': by,
    857     'value': value})['value']

File ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py:428, in WebDriver.execute(self, driver_command, params)
    426 response = self.command_executor.execute(driver_command, params)
    427 if response:
--> 428     self.error_handler.check_response(response)
    429     response['value'] = self._unwrap_value(
    430         response.get('value', None))
    431     return response

File ~\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py:243, in ErrorHandler.check_response(self, response)
    241         alert_text = value['alert'].get('text')
    242     raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 243 raise exception_class(message, screen, stacktrace)

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/section[2]/div/div/div[1]/div/div[4]/section[1]/div[1]/div/div[2]/div[2]/div/div/div[2]/table/tbody/tr[2]/td[2]/div"}
  (Session info: chrome=105.0.5195.102)
Stacktrace:
Backtrace:
    Ordinal0 [0x00A2C0A3 2212003]
    Ordinal0 [0x009C2CC1 1780929]
    Ordinal0 [0x008D465D 804445]
    Ordinal0 [0x00903475 996469]
    Ordinal0 [0x0090363B 996923]
    Ordinal0 [0x00931382 1184642]
    Ordinal0 [0x0091EC64 1109092]
    Ordinal0 [0x0092F5B2 1177010]
    Ordinal0 [0x0091EA36 1108534]
    Ordinal0 [0x008F83C9 951241]
    Ordinal0 [0x008F9396 955286]
    GetHandleVerifier [0x00CD9CE2 2746722]
    GetHandleVerifier [0x00CCA234 2682548]
    GetHandleVerifier [0x00ABB34A 524234]
    GetHandleVerifier [0x00AB9B60 518112]
    Ordinal0 [0x009C9FBC 1810364]
    Ordinal0 [0x009CEA28 1829416]
    Ordinal0 [0x009CEB15 1829653]
    Ordinal0 [0x009D8744 1869636]
    BaseThreadInitThunk [0x76A4FA29 25]
    RtlGetAppContainerNamedObjectPath [0x77C07A9E 286]
    RtlGetAppContainerNamedObjectPath [0x77C07A6E 238]

CodePudding user response:

This is one way of accessing the information from those pages (setup is on linux, you need a working setup for your system) - I'm just printing out all tables, you can do your own stuff:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time as t
import pandas as pd
import undetected_chromedriver as uc


options = uc.ChromeOptions()
# options.add_argument("--no-sandbox")
# options.add_argument('--disable-notifications')
options.add_argument("--window-size=1280,720")

browser = uc.Chrome(options=options)

actions = ActionChains(browser)
wait = WebDriverWait(browser, 20)

url_list = ['https://www.valueresearchonline.com/stocks/44052/reliance-industries-ltd', 'https://www.valueresearchonline.com/stocks/44811/tata-consultancy-services-ltd']

for url in url_list:
    browser.get(url)
    print(url)
    dfs = pd.read_html(browser.page_source)
    for df in dfs:
        display(df)
    t.sleep(5)

Result printed in terminal:

https://www.valueresearchonline.com/stocks/44052/reliance-industries-ltd
Unnamed: 0  YTD 1 Month 3 Months    1 Year  3 Years 5 Years 10 Years
0   Reliance    9.62    0.19    -4.36   7.02    28.52   26.22   20.82
1   S&P BSE Sensex  3.18    1.30    10.68   3.09    17.27   13.52   12.91
2   #   --  --  --  --  --  --  --
Unnamed: 0  2021    2020    2019    2018    2017    2016    2015
0   Reliance    19.15   32.76   35.06   23.25   70.19   6.60    14.27
1   S&P BSE Sensex  21.99   15.75   14.38   5.87    27.91   1.95    -5.03
2   S&P BSE Sensex  21.99   15.75   14.38   5.87    27.91   1.95    -5.03
Unnamed: 0  Stock   Peer Median Unnamed: 3
Unnamed: 0  Stock   Peer Median Unnamed: 3
0   P/E 26.18   16.17   Created with Highcharts 9.2.2
1   P/B 2.18    1.19    Created with Highcharts 9.2.2
2   Dividend Yield  0.31    2.72    Created with Highcharts 9.2.2
Unnamed: 0  Stock   Peer Median Unnamed: 3
Unnamed: 0  Stock   Peer Median Unnamed: 3
0   TTM EPS YoY change (%)  32.73   3.03    Created with Highcharts 9.2.2
1   Returns on Equity   9.63    11.12   Created with Highcharts 9.2.2
2   Piotroski F-Score   7.00    --  NaN
https://www.valueresearchonline.com/stocks/44811/tata-consultancy-services-ltd
Unnamed: 0  YTD 1 Month 3 Months    1 Year  3 Years 5 Years 10 Years
0   Tata Consultancy Services   -13.38  -5.39   -3.63   -14.60  14.55   21.41   16.61
1   S&P BSE Sensex  3.20    1.32    10.70   3.10    17.27   13.52   12.91
2   S&P BSE IT  -21.61  -2.99   0.44    -13.56  23.07   24.54   17.26
Unnamed: 0  2021    2020    2019    2018    2017    2016    2015
0   Tata Consultancy Services   27.66   32.07   13.61   43.11   14.19   -2.10   -4.27
1   S&P BSE Sensex  21.99   15.75   14.38   5.87    27.91   1.95    -5.03
2   S&P BSE IT  56.07   56.68   9.84    24.78   10.83   -8.00   4.51
Unnamed: 0  Stock   Peer Median Unnamed: 3
Unnamed: 0  Stock   Peer Median Unnamed: 3
0   P/E 30.34   23.56   Created with Highcharts 9.2.2
1   P/B 11.99   3.95    Created with Highcharts 9.2.2
2   Dividend Yield  1.34    0.96    Created with Highcharts 9.2.2
Unnamed: 0  Stock   Peer Median Unnamed: 3
Unnamed: 0  Stock   Peer Median Unnamed: 3
0   TTM EPS YoY change (%)  14.01   19.21   Created with Highcharts 9.2.2
1   Returns on Equity   41.85   19.02   Created with Highcharts 9.2.2
2   Piotroski F-Score   7.00    --  NaN

For undetected_chromedriver documentation: https://pypi.org/project/undetected-chromedriver/ And for Selenium documentation, visit https://www.selenium.dev/documentation/

CodePudding user response:

The error message is because, the intended element is not being found due to bot detection.

Unfortunately, there is no mechanism to bypass hCaptcha. You can use captcha solving services like 2captcha.

But, you could also try modules lkundetected-chromedriver to see if bypassing captcha works.

  • Related