Using Python & Selenium, how to extract the text from HTML containing the <p> tag-CodePudding

This I know is a very simple question. I'm quite sick and trying to finish up this presentation and my brain just doesn't seem to be working right.

The HTML code is as follows:

<p>id="script_id">1</p>

CodePudding user response：

If your text is id="script_id">1 then you may use the below:

x = driver.find_element(By.TAG_NAME, 'p').text
print(x)

Output:

id="script_id">1

Process finished with exit code 0

But note that p may occur in many places and just not this line, and hence relying on just the p tag is not advisable at all in a larger picture. If it is just the purpose of this line, it would be ok, but in an application. You may have to look for some other connecting things and build a locator using all of them.

If somehow, the line you provided is faulty, i.e., if the script_id is indeed the attribute of p, i.e., <p id="script_id">1 Then, this would do:

x = driver.find_element(By.ID, 'script_id').text
print(x)

CodePudding user response：

The HTML in it's current form is invalid and ideally it should have been:

<p id="script_id">1</p>

To locate the element with text as 1 you can use either of the following Locator Strategies:

Using id:

element = driver.find_element(By.ID, "script_id")

Using css_selector:

element = driver.find_element(By.CSS_SELECTOR, "p#script_id")

Using xpath:

element = driver.find_element(By.XPATH, "//p[@id='script_id']")

To extract the text 1 from the element ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

Using ID and text attribute:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "script_id"))).text)

Using CSS_SELECTOR and get_attribute("innerText"):

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#script_id"))).get_attribute("innerText"))

Using XPATH and get_attribute("innerHTML"):

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[@id='script_id']"))).get_attribute("innerHTML"))

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

References

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium