Extract the Value of a webpage title by scraping the element-CodePudding

Im very new to Python/Coding so please bare with me.

However, I'm trying to extract the text from the title of a webpage (input by the user), by scraping the 'webelement' of the page and finding its value using Selenium.

However, it keeps just returning the value 'none', instead of what I would expect to see (in this case 'BLACK BELTED WRAP COAT'.

Code can be found below:

title = driver.find_elements(By.XPATH,('/html/body/div[4]/div/div[3]/div[4]/div[1]/div[1]/form/div/div[2]/a/h2'))

//rest of code hidden but if you need more, please do let me know. (I'm new and don't want to spam)

Any idea what's causing this?

The source URL I'm entering is: https://www.riverisland.com/p/black-belted-wrap-coat-782866

This runs without error, but returns an unexpected value (as seen in below images).

enter image description here enter image description here

Appreciate it and apologies if I've missed anything. Ginge

CodePudding user response：

If you are trying to find an element use find_element instead of find_elements . find_elements will return a list of webelements.

Try with below code:

Imports required for Explicit waits
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver.get("https://www.riverisland.com/p/black-belted-wrap-coat-782866")

wait = WebDriverWait(driver,30)

# Click on Accept cookies
wait.until(EC.element_to_be_clickable((By.NAME,"accept-all"))).click()

title = wait.until(EC.visibility_of_element_located((By.XPATH,"//h2[@data-localize='Product_Title']")))
print(title.text)

BLACK BELTED WRAP COAT

CodePudding user response：

To print the text BLACK BELTED WRAP COAT you can use either of the following Locator Strategies:

Using css_selector and get_attribute("innerHTML"):

print(driver.find_element(By.CSS_SELECTOR, "h2.product-title.ui-product-title").get_attribute("innerHTML"))

Using xpath and text attribute:

print(driver.find_element(By.XPATH, "//h2[@class='product-title ui-product-title']").text)

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR and text attribute:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h2.product-title.ui-product-title"))).text)

Using XPATH and get_attribute("innerHTML"):

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h2[@class='product-title ui-product-title']"))).get_attribute("innerHTML"))

Console Output:

BLACK BELTED WRAP COAT

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

References

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

CodePudding user response：

You have all been massively helpful!

I got this fixed, thanks.

Ginge