Selenium select strings seperated by tag-CodePudding

Suppose I have the following HTML code

<tag1>
 "hello"
 <tag2></tag2>
 "world"
</tag1>

Then driver.find_element(By.CSS_SELECTOR,"tag1").text outputs the string "helloworld". How can I get the strings "hello" and "world" separately or at least get a string "hello world" seperated by whitespace?

CodePudding user response：

<tag1> has 3 child nodes, the text node "hello", the empty <tag2/>, and the "world" text node. When you call .text on a node, it just combines the text representations of all child nodes. You don't get a space between "hello" and "world" because whitespace does not matter in XML / HTML.

You can iterate over tag1's child nodes instead of calling .text, and then decide what to do with the children. If you know it's only 1 level deep, you can call .text on each child node and concatenate the results with separators like spaces. If this gets deeper than just one level of child nodes, you can recurse the child nodes.

CodePudding user response：

As @Robert has already said there are 3 child nodes of tag1. You can get all of them with JavaScript code

script = "return document.getElementsByTagName('tag1')[0].childNodes"
nodes = driver.execute_script(script)

Since tag2 is not a text node Selenium "knows" how to handle it, so it will be returned as WebElement instance... Two others are text nodes and Selenium "doesn't know" how to handle them, so they will be returned as dictionaries (something similar to what script code returns originally in JavaScript).

To get required output and ignore tag2 you can do:

text = ' '.join([node['textContent'].strip() for node in nodes if isinstance(node, dict)])

The output of print(text) should be 'hello world'