Home > Software engineering >  Python3 Selenium - Failed to extract the text value from an element in a HTML page (web scraping)
Python3 Selenium - Failed to extract the text value from an element in a HTML page (web scraping)

Time:08-19

I have the following HTML in a web page where I need to retrieve number of jobs in a table:

<span >1 - 10 of 16 items</span>  

I can find the element succesfully in various ways, but when I try to retrieve the number of rows, the 16 in "1 - 10 of 16 items", it returns NULL.

I find the element as below which gives the element session and GUID:

job_items = driver.find_element(By.CSS_SELECTOR, 'span.k-pager-info.k-label')
print('Jobs: ', job_items)

Output:  Jobs:  <selenium.webdriver.remote.webelement.WebElement (session="f528f37ec897b8c5006b3b5040a99c12", element="758533ea-800f-4895-ba66-e8247e882edb")>

Getting the same element using Xpath and now requesting the text value:

job_items = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[4]/div/div[2]/div[1]/div/div/span[2]').text
print('Jobs1: ', job_items)

Output:    Jobs1:  No items to display

I tried XXX.get_attribute("innerHTML") as well, also returns empty list / NULL

What am I missing please?

CodePudding user response:

In the first part you are missing .text
it should be

job_items = driver.find_element(By.CSS_SELECTOR, 'span.k-pager-info.k-label').text
print('Jobs: ', job_items)

The command driver.find_element(By.CSS_SELECTOR, 'span.k-pager-info.k-label') returns a web element object. To extract its text .text should be applied on it.

In the second part you already applying the .text on the returned web element object so nothing is assigned to job_items .
To make it work and to be similar you can do the following:

job_items = driver.find_element(By.CSS_SELECTOR, 'span.k-pager-info.k-label')
print('Jobs: ', job_items.text)

job_items = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[4]/div/div[2]/div[1]/div/div/span[2]')
print('Jobs1: ', job_items.text)

CodePudding user response:

You were close enough. In your first attempt you printed the element itself, that's why you see the output as:

<selenium.webdriver.remote.webelement.WebElement (session="f528f37ec897b8c5006b3b5040a99c12", element="758533ea-800f-4895-ba66-e8247e882edb")>

where as in your usecase you want the innerText. So you have to use the text attribute or get_attribute() method.


Solution

To print the text 1 - 10 of 16 items you can use either of the following locator strategies:

  • Using css_selector and get_attribute("innerHTML"):

    print(driver.find_element(By.CSS_SELECTOR, "span.k-pager-info.k-label").get_attribute("innerHTML"))
    
  • Using xpath and text attribute:

    print(driver.find_element(By.XPATH, "//span[@class='k-pager-info k-label']").text)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.common.by import By
    

To extract the text 1 - 10 of 16 items_ ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR and text attribute:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.k-pager-info.k-label"))).text)
    
  • Using XPATH and get_attribute("innerHTML"):

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='k-pager-info k-label']"))).get_attribute("innerHTML"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

  • Related