I am trying to use selenium to scrape dynamic webpages. Here, I tried to print all the authors in the website
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://quotes.toscrape.com/js")
elements = driver.find_elements_by_class_name("author")
for i in elements:
print(i.text)
driver.quit()
Which worked pretty well and printed me the right result:
Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
André Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin
But when I try to use a similar code for another website
I get an error:
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
(Session info: chrome=98.0.4758.102)
This is my second code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.myperfume.co.il/155567-כל-המותגים-לגבר?order=up_title'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
elements = driver.find_elements_by_class_name("title text-center")
for i in elements:
print(i.text)
driver.quit()
What I am trying to do in this code is to print all the names of the perdumes in the webpage. After inspecting I saw that all of the names are in a class that called: 'title text-center'.
How can I fix my code?
CodePudding user response:
title text-center
are actually 2 class names title
and text-center
.
In order to locate elements by 2 class names you have to use XPath or CSS Selector.
So, instead of
elements = driver.find_elements_by_class_name("title text-center")
You can use
elements = driver.find_elements_by_xpath("//h3[@class='title text-center']")
Or
elements = driver.find_elements_css_selector("h3.title.text-center")
Also, you should add waits to access the web elements only when they are loaded and ready.
This should be done with Expected Conditions explicit waits, as following:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.myperfume.co.il/155567-כל-המותגים-לגבר?order=up_title'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
wait = WebDriverWait(driver, 20)
driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3.title.text-center")))
elements = driver.find_elements_css_selector("h3.title.text-center")
for i in elements:
print(i.text)
driver.quit()
CodePudding user response:
This error message...
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
...implies that the locator strategy you have used is not a valid locator strategy as By.CLASS_NAME takes a single classname as an argument.
To print all the names of the perfumes in the webpage you can use List Comprehension you can use the following Locator Strategy:
Using css_selector:
driver.get("https://www.myperfume.co.il/155567-כל-המותגים-לגבר?order=up_title") print([my_elem.get_attribute("innerHTML") for my_elem in driver.find_elements_by_css_selector("h3.title")])
Ideally you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use the following Locator Strategy:
Using
CSS_SELECTOR
andget_attribute("innerHTML")
:driver.get("https://www.myperfume.co.il/155567-כל-המותגים-לגבר?order=up_title") print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3.title")))])
Console Output:
[' 212 וי אי פי לגבר א.ד.ט 212 vip for men e.d.t ', ' 212 ניו יורק לגבר א.ד.ט 212 nyc for men e.d.t ', ' 212 סקסי לגבר א.ד.ט 212 sexy men e.d.t ', ' אברקרומבי פירס 100 מל א.ד.ק Abercrombie & Fitch Fierce 100 ml e.d.c ', ' אברקרומבי פירס 50 מל א.ד.ק Abercrombie & Fitch Fierce 50 ml e.d.c ', ' אברקרומבי פירס גודל ענק 200 מל א.ד.ק Abercrombie & Fitch Fierce 200 ml e.d.c ', ' אברקרומבי פירסט אינסטינקט לגבר א.ד.ט Abercrombie & Fitch First Instinct e.d.t ', ' אגואיסט א.ד.ט Egoiste e.d.t ', ' אגואיסט פלטינום א.ד.ט Egoiste Platinum e.d.t ', ' או דה בלנק א.ד.ט Eau De Blanc e.d.t ', ' או דה פרש א.ד.ט Eau Fraiche e.d.t ', ' אובסיישן לגבר א.ד.ט Obsession for men e.d.t ']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC