Home > front end >  Selenium doesnt work for a certain website
Selenium doesnt work for a certain website

Time:02-19

I am trying to use selenium to scrape dynamic webpages. Here, I tried to print all the authors in the website

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://quotes.toscrape.com/js")
elements = driver.find_elements_by_class_name("author")
for i in elements:
    print(i.text)
driver.quit()

Which worked pretty well and printed me the right result:

Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
André Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin

But when I try to use a similar code for another website

I get an error:

selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
  (Session info: chrome=98.0.4758.102)

This is my second code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.myperfume.co.il/155567-כל-המותגים-לגבר?order=up_title'


driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
elements = driver.find_elements_by_class_name("title  text-center")
for i in elements:
    print(i.text)
driver.quit()

What I am trying to do in this code is to print all the names of the perdumes in the webpage. After inspecting I saw that all of the names are in a class that called: 'title text-center'.

How can I fix my code?

CodePudding user response:

title text-center are actually 2 class names title and text-center.
In order to locate elements by 2 class names you have to use XPath or CSS Selector.
So, instead of

elements = driver.find_elements_by_class_name("title  text-center")

You can use

elements = driver.find_elements_by_xpath("//h3[@class='title  text-center']")

Or

elements = driver.find_elements_css_selector("h3.title.text-center")

Also, you should add waits to access the web elements only when they are loaded and ready.
This should be done with Expected Conditions explicit waits, as following:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.myperfume.co.il/155567-כל-המותגים-לגבר?order=up_title'


driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
wait = WebDriverWait(driver, 20)

driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3.title.text-center")))
elements = driver.find_elements_css_selector("h3.title.text-center")
for i in elements:
    print(i.text)
driver.quit()

CodePudding user response:

This error message...

selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator

...implies that the locator strategy you have used is not a valid locator strategy as By.CLASS_NAME takes a single classname as an argument.


To print all the names of the perfumes in the webpage you can use List Comprehension you can use the following Locator Strategy:

  • Using css_selector:

    driver.get("https://www.myperfume.co.il/155567-כל-המותגים-לגבר?order=up_title")
    print([my_elem.get_attribute("innerHTML") for my_elem in driver.find_elements_by_css_selector("h3.title")])
    

Ideally you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use the following Locator Strategy:

  • Using CSS_SELECTOR and get_attribute("innerHTML"):

    driver.get("https://www.myperfume.co.il/155567-כל-המותגים-לגבר?order=up_title")
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3.title")))])
    
  • Console Output:

    [' 212 וי אי פי לגבר א.ד.ט 212 vip for men e.d.t ', ' 212 ניו יורק לגבר א.ד.ט 212 nyc for men e.d.t ', ' 212 סקסי לגבר א.ד.ט 212 sexy men e.d.t ', ' אברקרומבי פירס 100 מל א.ד.ק Abercrombie & Fitch Fierce 100 ml e.d.c ', ' אברקרומבי פירס 50 מל א.ד.ק Abercrombie & Fitch Fierce 50 ml e.d.c ', ' אברקרומבי פירס גודל ענק 200 מל א.ד.ק Abercrombie & Fitch Fierce 200 ml e.d.c ', ' אברקרומבי פירסט אינסטינקט לגבר א.ד.ט  Abercrombie & Fitch First Instinct e.d.t ', ' אגואיסט א.ד.ט Egoiste e.d.t ', ' אגואיסט פלטינום א.ד.ט Egoiste Platinum e.d.t ', ' או דה בלנק א.ד.ט Eau De Blanc e.d.t ', ' או דה פרש א.ד.ט Eau Fraiche e.d.t ', ' אובסיישן לגבר א.ד.ט Obsession for men e.d.t ']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Related