Using python and Selenium, I am unable to scrape any of the JavaScript inserted data found on this website. I have tested my code on other websites with dynamically generated content and it seems to work so wondering if this is some specific way the JS is generated on this site perhaps?
I wish to find out the stock availability on the website for this item. The code below generates following exception: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@id="theList"]/article[1]//div[@]/div/text()"}
but I know this xpath works from trying it in chrome dev tools.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.common.exceptions import TimeoutException
service = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(service=service, options=options)
driver.get('https://www.deejay.de/velvet velour')
elem = driver.find_element(
By.XPATH, '//div[@id="theList"]/article[1]//div[@]/div[1]')
print(elem.text)
Reading related SO posts I thought perhaps it could be that I needed to manually tell the driver to wait for the page to load, so I tried this:
service = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(service=service, options=options)
driver.get('https://www.deejay.de/velvet velour')
try:
elem = WebDriverWait(driver=driver, timeout=20).until(
lambda x: x.find_element(By.XPATH, '//div[@id="theList"]/article[1]//div[@]/div[1]'))
print(elem.text)
except TimeoutException:
print('took too long')
But it always times out, and returns my exception message.
Any ideas why this might not be working? As I said, I can obtain JS-generated code from other websites but not from deejay.de for some reason
CodePudding user response:
you need to switch to the iframe first.
iframe = WebDriverWait(driver, 30).until(EC.presence_of_element_located(
(By.XPATH, "//iframe[@id='myIframe']")))
driver.switch_to.frame(iframe)
then find all the items and print their stock availability like this
items = driver.find_elements(By.XPATH, "//div[@class='order']/div[1]")
for item in items:
print(item.text)
finally switch back to default content to access element outside the iframe
driver.switch_to.default_content()
NOTE: You will need to import the following
from selenium.webdriver.support import expected_conditions as EC