Home > Back-end >  Unable to extract text in a list with Selenium in Python
Unable to extract text in a list with Selenium in Python

Time:09-04

I'm trying to extract information from this page: https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1

More specifically the text in the first row under 'Subject'. See here: //*[@id="main-content"]/app-list-component/div/div[2]/div[2]/div/app-nt-list-item[1]/div/div[4]/p

Does anybody have an idea how to do so? I've tried numerous ways but results always comes empty. It looks like a dynamic list...

Below is the last thing I've tested.

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(executable_path='//chromedriver.exe')
driver.get("https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1")

subject = driver.find_element("xpath", '//*[@id="main-content"]/app-list-component/div/div[2]/div[2]/div/app-nt-list-item[1]/div/div[4]/p')

print(subject)

driver.quit()

Thank you very much in advance! As you can see, I'm not very skilled at this :(

CodePudding user response:

@Carlos Abundancia, you need to read the text attribute of the web element. So if you change the following lines in your code:

print(subject)

to:

print(subject.text)

you should get your output. Also I suggest you should put explicit wait for the subject element as it takes some time to load and I was getting NoSuchElementException. So my code version with wait is:

driver.get("https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1")
subject = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="main-content"]/app-list-component/div/div[2]/div[2]/div/app-nt-list-item[1]/div/div[4]/p')))
print(subject.text)
driver.quit()

you will need the following imports:

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By

CodePudding user response:

You need to take care of a couple of things here as follows:

  • subject is the WebElement returned through driver.find_element(). Instead of printing the element itself, you need to print it's text.
  • The desired element may not be redily available within the HTML DOM. So you need to wait a bit.

Solution

To extract and print the text Salmonella in raw chicken from Latvia ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get('https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "Accept all cookies"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.results-list-body>app-nt-list-item>div div[class*='item-subject']>p"))).text)
    
  • Using XPATH:

    driver.get('https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "Accept all cookies"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='results-list-body']/app-nt-list-item/div//div[contains(@class, 'item-subject')]/p"))).text)
    
  • Console Output:

    Salmonella in raw chicken from Latvia
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

  • Related