Im on python and I tried to get price data($25.99)from below Amazon webpage.
I used both beautiful soup and selenium, but my selenium code doesn't work.
#with beautiful soup
import requests
from bs4 import BeautifulSoup
PRODUCT="https://www.amazon.com/Guffercty-kred-Sublimation-Mechanical-Keyboard/dp/B09HWZQQZJ/ref=sr_1_14?crid=3UHD6OMRY6RYG&keywords=keycaps&qid=1667444474&qu=eyJxc2MiOiI4Ljc5IiwicXNhIjoiOC41OCIsInFzcCI6IjcuOTMifQ==&sprefix=keycap,aps,275&sr=8-14&th=1"
response = requests.get(PRODUCT,
headers={"Accept-Language":"ko,en-US;q=0.9,en;q=0.8,sv;q=0.7,ja;q=0.6",
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"})
soup = BeautifulSoup(response.text, "html.parser")
price = float(soup.find(name="span", class_="a-offscreen").getText())
print(price)
above code perfectly works for me and returns the price. code prints $25.99 on the prompter.
However, below code with selenium doesn't work.
#with selenium
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("/my/chrome/driver/path/chromedriver")
driver = webdriver.Chrome(service=service)
driver.get(url="https://www.amazon.com/Guffercty-kred-Sublimation-Mechanical-Keyboard/dp/B09HWZQQZJ/ref=sr_1_14?crid=3UHD6OMRY6RYG&keywords=keycaps&qid=1667444474&qu=eyJxc2MiOiI4Ljc5IiwicXNhIjoiOC41OCIsInFzcCI6IjcuOTMifQ==&sprefix=keycap,aps,275&sr=8-14&th=1")
price = driver.find_element(By.CSS_SELECTOR, 'span .a-offscreen')
print(price.text)
unlike the bs4 code, selenium code doesn't show me anything on the prompter.
I thought "find_element(By.CSS_SELECTOR, 'span .a-offscreen')" in selenium works the same as "find(name='span', class_'a-offscreen')" in bs4.
I also tried By.XPATH as well, but it doesn't work either. Am I missing something?
CodePudding user response:
You probably need to wait for the page to finish rendering. Or you're finding some other element. I see 60 items that match that selector.
I'd try a selector like: div#corePrice_feature_div span .a-offscreen
And then wait for that element to be displayed and enabled. https://www.selenium.dev/documentation/webdriver/waits/#explicit-wait
If you don't want to write your own lambda, Python has a class for some basic out of the box waits. There's one for text_to_be_present_in_element: https://www.selenium.dev/selenium/docs/api/py/webdriver_support/selenium.webdriver.support.expected_conditions.html?highlight=expected
CodePudding user response:
There are many span elements with that class, so best to make specific to the main item via a parent div:
price = WebDriverWait(driver, 2).until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "div#corePrice_feature_div span.a-offscreen")
)
)
print(price.get_attribute("textContent"))
price.text is empty as "text" only returns visible data, whereas (as the class name suggests) this field is offscreen, and the visible text comes from the aggregation of the other spans representing symbol, whole and fraction