i'm trying to get a certain text in HTML using xpath.
The HTML is as below and as you see,
the "target text" which i want to get is in node p.
But "target text" doesn't have its node or attribute,
it is just presented alone in node p.
How can i get this?
Supplement : I'm using xpath in selenium. So, i couldn't use "text()" in xpath query
<p lang="ko">
<span >non-target text1 </span>
<span >non-target text2 </span>
target text
</p>
CodePudding user response:
target text
belongs to parent p
node.
What you need to do here is:
Get the parent element text (it will include parent element text content and child element text contents).
Then remove child element text contents.
In case this is done with Selenium the code can be as following:
parent_text = ""
all_text = driver.find_element(By.XPATH, ("//p[@class='mean']")).text
child_elements = driver.find_elements(By.XPATH, ("//*[@class='parent']//*"))
for child_element in child_elements:
parent_text = all_text.replace(child_element.text, '')
print(parent_text)
CodePudding user response:
Use //p[@class = 'mean' and @lang = 'ko']/text()[normalize-space()]
to select any text node children of that p
element that contain more than white space. Note that the text node contents begins after the closing </span>
and ends before the closing </p>
so its content with be e.g.
target text
If you want to remove leading and trailing white space you can use e.g. normalize-space(//p[@class = 'mean' and @lang = 'ko']/text()[normalize-space()])
.