I am trying to create an application that scrapes certain e-commerce websites. I am using Selenium for this purpose and trying to deploy my application on an ec2 instance running centos. Before deploying, I developed my code locally and it worked but it gives me errors on the remote machine.
The code that I am using
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
ser = Service(ChromeDriverManager().install())
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
selenium_driver = webdriver.Chrome(service=ser, options=chrome_options)
url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'
selenium_driver.get(url)
title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
print(title.text)
When I try to run this code on remote machine I get an error with the following stacktrace
Traceback (most recent call last):
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2091, in __call__
return self.wsgi_app(environ, start_response)
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2076, in wsgi_app
response = self.handle_exception(e)
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/ec2-user/price_tracker/flask_api.py", line 22, in home
title, price, isSizeAvailable, shop = prices.checkInfoByShop(url, size)
File "/home/ec2-user/price_tracker/check_prices.py", line 132, in checkInfoByShop
secondaryPriceXPath=secondaryPriceXPath)
File "/home/ec2-user/price_tracker/check_prices.py", line 61, in checkSelenium
title = self.selenium_driver.find_element(By.XPATH, titleXPath)
File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 1246, in find_element
'value': value})['value']
File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
self.error_handler.check_response(response)
File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span"}
(Session info: headless chrome=96.0.4664.110)
Stacktrace:
#0 0x559979e8dee3 <unknown>
#1 0x55997995b608 <unknown>
#2 0x559979991aa1 <unknown>
#3 0x559979991c61 <unknown>
#4 0x5599799c4714 <unknown>
#5 0x5599799af29d <unknown>
#6 0x5599799c23bc <unknown>
#7 0x5599799af163 <unknown>
#8 0x559979984bfc <unknown>
#9 0x559979985c05 <unknown>
#10 0x559979ebfbaa <unknown>
#11 0x559979ed5651 <unknown>
#12 0x559979ec0b05 <unknown>
#13 0x559979ed6a68 <unknown>
#14 0x559979eb505f <unknown>
#15 0x559979ef1818 <unknown>
#16 0x559979ef1998 <unknown>
#17 0x559979f0ceed <unknown>
#18 0x7ff5dd53b40b <unknown>
For debugging purposes, I tried to read the entire body of the webpage using
body = selenium_driver.find_element(By.XPATH, '/html/body')
print(body.text)
which returns
"We're sorry, something has gone wrong. Please try again.\nIf you continue to have trouble, please contact us at [email protected].\nChecking your browser before accessing www.everlane.com.\nThis process is automatic. Your browser will redirect to your requested content shortly.\nPlease allow up to 5 seconds…\nDebugging Information\nIP Address\n<ip-address>\nRay ID\n6c57184d797805a0"
I understand that my request might be getting blocked for some reason but is there a way to bypass this?
I have tried adding wait statements in the hope of landing on the redirect but nothing has worked so far.
CodePudding user response:
That message looks like the page content has been changed. So your code is working as intended. I'd have Selenium wait for an element to be visible (Read more here). If you don't want to do that you can also wait for the page to redirect. How to do that is answered in another SO question here.
CodePudding user response:
I'd suggest using webdriver waits to wait for the page to load.
wait=WebDriverWait(driver,selenium_driver)
elem=wait.until(EC.visibility_of_element_located((By.XPATH,"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span")))
print(elem.text)
Imports:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC