Suppose I have the following HTML code
<tag1>
"hello"
<tag2></tag2>
"world"
</tag1>
Then driver.find_element(By.CSS_SELECTOR,"tag1").text
outputs the string "helloworld"
. How can I get the strings "hello"
and "world"
separately or at least get a string "hello world"
seperated by whitespace?
CodePudding user response:
<tag1>
has 3 child nodes, the text node "hello", the empty <tag2/>
, and the "world" text node. When you call .text
on a node, it just combines the text representations of all child nodes. You don't get a space between "hello" and "world" because whitespace does not matter in XML / HTML.
You can iterate over tag1
's child nodes instead of calling .text
, and then decide what to do with the children. If you know it's only 1 level deep, you can call .text
on each child node and concatenate the results with separators like spaces. If this gets deeper than just one level of child nodes, you can recurse the child nodes.
CodePudding user response:
As @Robert has already said there are 3 child nodes of tag1
. You can get all of them with JavaScript code
script = "return document.getElementsByTagName('tag1')[0].childNodes"
nodes = driver.execute_script(script)
Since tag2
is not a text node Selenium "knows" how to handle it, so it will be returned as WebElement
instance... Two others are text nodes and Selenium "doesn't know" how to handle them, so they will be returned as dictionaries (something similar to what script
code returns originally in JavaScript).
To get required output and ignore tag2
you can do:
text = ' '.join([node['textContent'].strip() for node in nodes if isinstance(node, dict)])
The output of print(text)
should be 'hello world'