I'm trying to extract information from this page: https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1
More specifically the text in the first row under 'Subject'. See here: //*[@id="main-content"]/app-list-component/div/div[2]/div[2]/div/app-nt-list-item[1]/div/div[4]/p
Does anybody have an idea how to do so? I've tried numerous ways but results always comes empty. It looks like a dynamic list...
Below is the last thing I've tested.
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path='//chromedriver.exe')
driver.get("https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1")
subject = driver.find_element("xpath", '//*[@id="main-content"]/app-list-component/div/div[2]/div[2]/div/app-nt-list-item[1]/div/div[4]/p')
print(subject)
driver.quit()
Thank you very much in advance! As you can see, I'm not very skilled at this :(
CodePudding user response:
@Carlos Abundancia, you need to read the text attribute of the web element. So if you change the following lines in your code:
print(subject)
to:
print(subject.text)
you should get your output. Also I suggest you should put explicit wait for the subject element as it takes some time to load and I was getting NoSuchElementException. So my code version with wait is:
driver.get("https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1")
subject = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="main-content"]/app-list-component/div/div[2]/div[2]/div/app-nt-list-item[1]/div/div[4]/p')))
print(subject.text)
driver.quit()
you will need the following imports:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
CodePudding user response:
You need to take care of a couple of things here as follows:
subject
is the WebElement returned throughdriver.find_element()
. Instead of printing the element itself, you need to print it's text.- The desired element may not be redily available within the HTML DOM. So you need to wait a bit.
Solution
To extract and print the text Salmonella in raw chicken from Latvia ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get('https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "Accept all cookies"))).click() print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.results-list-body>app-nt-list-item>div div[class*='item-subject']>p"))).text)
Using XPATH:
driver.get('https://webgate.ec.europa.eu/rasff-window/screen/list?consumer=-1') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "Accept all cookies"))).click() print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='results-list-body']/app-nt-list-item/div//div[contains(@class, 'item-subject')]/p"))).text)
Console Output:
Salmonella in raw chicken from Latvia
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python