I'm trying to extract the text value following a b tag that contains specific text. I'm using Selenium web driver with Python3.
The HTML inspected for the value I'm trying to return (11,847) is here:
This has an Xpath below (I'm not using this xpath directly to find the element as the table construction changes for different examples that I plan to iterate through):
/html/body/form[1]/div[2]/table[2]/tbody/tr[3]/td[2]/text()
As an example, when I print the below it returns Att: i.e. the element located by my search for the text 'Att' within the b tags.
att=driver.find_element("xpath",".//b[contains(text(), 'Att')]").text
print(att)
Is there a way I can return the value following <b>Att:</b>
by searching for 'Att:' (or conversly, I'd also like to return the value following <b>Ref:</b>
.
Thanks in advance.
CodePudding user response:
11,847
text content belongs to td
node.
You can locate this td
element by it's child b
text content.
Then you will be able to retrieve the entire text content of that td
node.
It will contain Att:
and extra spaces and the desired 11,847
string.
Now you will need to remove the Att:
and extra spaces so only 11,847
will remain.
As following:
#get the entire text content
entire_text = driver.find_element(By.XPATH,"//td[.//b[contains(text(), 'Att')]]").text
#get the child node text content
child_text = driver.find_element(By.XPATH,"//b[contains(text(), 'Att')]").text
#remove child text content from entire text content
goal_text = entire_text.replace(child_text,'')
#trim white spaces
goal_text = goal_text.strip()
CodePudding user response:
You can use the find_element_by_xpath() method to locate the element that contains the text 'Att:' and then use the find_element_by_xpath() method again to locate the following text node. Here is an example of how you can do this:
att_element = driver.find_element_by_xpath("//b[contains(text(), 'Att:')]")
att_value = att_element.find_element_by_xpath('./following-sibling::text()').text
print(att_value)
This will locate the element that contains the text 'Att:', then locate the following text node, and return the text value of that node.
Similarly you can use the same xpath for 'Ref:' as well just change the text part to 'Ref:'
ref_element = driver.find_element_by_xpath("//b[contains(text(), 'Ref:')]")
ref_value = ref_element.find_element_by_xpath('./following-sibling::text()').text
print(ref_value)
Note that this will only work if the text value you're trying to extract is immediately following the element that contains 'Att:' or 'Ref:' in a text node.
CodePudding user response:
The following xpath
would result in an error:
/html/body/form[1]/div[2]/table[2]/tbody/tr[3]/td[2]/text()
as Selenium returns only WebElements but not objects.
Solution
The text 11,847
is within a text node which is the second decendent of the <td>
node. So to print the text you have to induce WebDriverWait for the visibility_of_element_located()
and you can use either of the following locator strategies:
Using XPATH and childNodes[n]:
print(driver.execute_script('return arguments[0].childNodes[2].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='initial']//td[@align='right']")))).strip())
Using XPATH and
splitlines()
:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='initial']//td[@align='right']"))).get_attribute("innerHTML").splitlines()[2])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC