I want to access a website and fetch information using Selenium. I have passed the web URL using a list (vro_list) and fetched information from each URL in a list (roe_list). Website links I'm accessing:
It seems like the website is blocking the access of the Selenium web driver. You can also see 'Brave is being controlled by automated software.' This happened on the second iteration of the for loop. The first iteration ran fine and I got the desired result.
How can I bypass this to fetch the required information? Please help
I'm also sharing the error message received on the console of Jupyter -
NoSuchElementException Traceback (most recent call last)
Input In [29], in <cell line: 8>()
8 for url_link in vro_list:
9 print(url_link)
---> 10 roe_item = fetch_roe(url_link)
11 roe_list.append(roe_item)
12 time.sleep(5)
Input In [27], in fetch_roe(link)
6 time.sleep(5)
8 #name = browser.find_element('xpath', '/html/body/div[3]/h1/span')
----> 9 roe = browser.find_element('xpath', '/html/body/section[2]/div/div/div[1]/div/div[4]/section[1]/div[1]/div/div[2]/div[2]/div/div/div[2]/table/tbody/tr[2]/td[2]/div')
11 return roe.text
File ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py:855, in WebDriver.find_element(self, by, value)
852 by = By.CSS_SELECTOR
853 value = '[name="%s"]' % value
--> 855 return self.execute(Command.FIND_ELEMENT, {
856 'using': by,
857 'value': value})['value']
File ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py:428, in WebDriver.execute(self, driver_command, params)
426 response = self.command_executor.execute(driver_command, params)
427 if response:
--> 428 self.error_handler.check_response(response)
429 response['value'] = self._unwrap_value(
430 response.get('value', None))
431 return response
File ~\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py:243, in ErrorHandler.check_response(self, response)
241 alert_text = value['alert'].get('text')
242 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here
--> 243 raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/section[2]/div/div/div[1]/div/div[4]/section[1]/div[1]/div/div[2]/div[2]/div/div/div[2]/table/tbody/tr[2]/td[2]/div"}
(Session info: chrome=105.0.5195.102)
Stacktrace:
Backtrace:
Ordinal0 [0x00A2C0A3 2212003]
Ordinal0 [0x009C2CC1 1780929]
Ordinal0 [0x008D465D 804445]
Ordinal0 [0x00903475 996469]
Ordinal0 [0x0090363B 996923]
Ordinal0 [0x00931382 1184642]
Ordinal0 [0x0091EC64 1109092]
Ordinal0 [0x0092F5B2 1177010]
Ordinal0 [0x0091EA36 1108534]
Ordinal0 [0x008F83C9 951241]
Ordinal0 [0x008F9396 955286]
GetHandleVerifier [0x00CD9CE2 2746722]
GetHandleVerifier [0x00CCA234 2682548]
GetHandleVerifier [0x00ABB34A 524234]
GetHandleVerifier [0x00AB9B60 518112]
Ordinal0 [0x009C9FBC 1810364]
Ordinal0 [0x009CEA28 1829416]
Ordinal0 [0x009CEB15 1829653]
Ordinal0 [0x009D8744 1869636]
BaseThreadInitThunk [0x76A4FA29 25]
RtlGetAppContainerNamedObjectPath [0x77C07A9E 286]
RtlGetAppContainerNamedObjectPath [0x77C07A6E 238]
CodePudding user response:
This is one way of accessing the information from those pages (setup is on linux, you need a working setup for your system) - I'm just printing out all tables, you can do your own stuff:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time as t
import pandas as pd
import undetected_chromedriver as uc
options = uc.ChromeOptions()
# options.add_argument("--no-sandbox")
# options.add_argument('--disable-notifications')
options.add_argument("--window-size=1280,720")
browser = uc.Chrome(options=options)
actions = ActionChains(browser)
wait = WebDriverWait(browser, 20)
url_list = ['https://www.valueresearchonline.com/stocks/44052/reliance-industries-ltd', 'https://www.valueresearchonline.com/stocks/44811/tata-consultancy-services-ltd']
for url in url_list:
browser.get(url)
print(url)
dfs = pd.read_html(browser.page_source)
for df in dfs:
display(df)
t.sleep(5)
Result printed in terminal:
https://www.valueresearchonline.com/stocks/44052/reliance-industries-ltd
Unnamed: 0 YTD 1 Month 3 Months 1 Year 3 Years 5 Years 10 Years
0 Reliance 9.62 0.19 -4.36 7.02 28.52 26.22 20.82
1 S&P BSE Sensex 3.18 1.30 10.68 3.09 17.27 13.52 12.91
2 # -- -- -- -- -- -- --
Unnamed: 0 2021 2020 2019 2018 2017 2016 2015
0 Reliance 19.15 32.76 35.06 23.25 70.19 6.60 14.27
1 S&P BSE Sensex 21.99 15.75 14.38 5.87 27.91 1.95 -5.03
2 S&P BSE Sensex 21.99 15.75 14.38 5.87 27.91 1.95 -5.03
Unnamed: 0 Stock Peer Median Unnamed: 3
Unnamed: 0 Stock Peer Median Unnamed: 3
0 P/E 26.18 16.17 Created with Highcharts 9.2.2
1 P/B 2.18 1.19 Created with Highcharts 9.2.2
2 Dividend Yield 0.31 2.72 Created with Highcharts 9.2.2
Unnamed: 0 Stock Peer Median Unnamed: 3
Unnamed: 0 Stock Peer Median Unnamed: 3
0 TTM EPS YoY change (%) 32.73 3.03 Created with Highcharts 9.2.2
1 Returns on Equity 9.63 11.12 Created with Highcharts 9.2.2
2 Piotroski F-Score 7.00 -- NaN
https://www.valueresearchonline.com/stocks/44811/tata-consultancy-services-ltd
Unnamed: 0 YTD 1 Month 3 Months 1 Year 3 Years 5 Years 10 Years
0 Tata Consultancy Services -13.38 -5.39 -3.63 -14.60 14.55 21.41 16.61
1 S&P BSE Sensex 3.20 1.32 10.70 3.10 17.27 13.52 12.91
2 S&P BSE IT -21.61 -2.99 0.44 -13.56 23.07 24.54 17.26
Unnamed: 0 2021 2020 2019 2018 2017 2016 2015
0 Tata Consultancy Services 27.66 32.07 13.61 43.11 14.19 -2.10 -4.27
1 S&P BSE Sensex 21.99 15.75 14.38 5.87 27.91 1.95 -5.03
2 S&P BSE IT 56.07 56.68 9.84 24.78 10.83 -8.00 4.51
Unnamed: 0 Stock Peer Median Unnamed: 3
Unnamed: 0 Stock Peer Median Unnamed: 3
0 P/E 30.34 23.56 Created with Highcharts 9.2.2
1 P/B 11.99 3.95 Created with Highcharts 9.2.2
2 Dividend Yield 1.34 0.96 Created with Highcharts 9.2.2
Unnamed: 0 Stock Peer Median Unnamed: 3
Unnamed: 0 Stock Peer Median Unnamed: 3
0 TTM EPS YoY change (%) 14.01 19.21 Created with Highcharts 9.2.2
1 Returns on Equity 41.85 19.02 Created with Highcharts 9.2.2
2 Piotroski F-Score 7.00 -- NaN
For undetected_chromedriver documentation: https://pypi.org/project/undetected-chromedriver/ And for Selenium documentation, visit https://www.selenium.dev/documentation/
CodePudding user response:
The error message is because, the intended element is not being found due to bot detection.
Unfortunately, there is no mechanism to bypass hCaptcha. You can use captcha solving services like 2captcha.
But, you could also try modules lkundetected-chromedriver to see if bypassing captcha works.