Home > Software design >  Selenium: get all elements after specific text in webelement
Selenium: get all elements after specific text in webelement

Time:07-07

We have the following Html:

<div>
  <img alt="Guest" >
  Bobby gave:   
  <img>
  <img>
  <img>
  <img>
   and took   
  <img>
  <img>
</div>

I want to get all image elements between the first text and the second text. And then seperatly all the img elements after the second text.

The amount of img elements varies so the following selenium code wont work:

message = driver.find_element(By.tag_name, 'div')
imgs_1 = message.find_elements(By.tag_name, 'img')[1:4]
imgs_2 = message.find_elements(By.tag_name, 'img')[5:]

Any suggestions with xpath or something else?

CodePudding user response:

You should be able to shorten this code or customize it for your needs. It is lengthy to explain things you want. Here it is:

# Using JS to get all the child nodes from div element
nodes = driver.execute_script('return document.querySelector("div").childNodes')

# Iteration over `nodes` list which modifies nodes list to:
# (1) store text value for #text nodes found in Child nodes
# (2) remove value for #text nodes which are empty lines

for c, node in enumerate(nodes):
    if isinstance(node, dict):
        node = node['textContent'].strip()
        if node:
            nodes[c] = node
        else:
            nodes.pop(c)
    else:
        nodes[c] = node


# a list that would store index of valid #text nodes in `nodes` list
index_texts = []

for c, j in enumerate(nodes):
    if isinstance(j, str):
        index_texts.append(c)

# for convenience only
def node_type(o):
    if isinstance(o, str):
        return "String     :"
    else:
        return "Web_element:"

# this is to show you a replica of elements that you posted in your sample code, 
# but it is now available in a Python list for you.

print("##Print HTML child nodes replica as a Python list##\n")
for node in nodes:
    print(node_type(node), node)

print("------------")

# self-explanatory
print("##Print all elements between first two string/text element##\n")
for x in range(index_texts[0] 1, index_texts[1]):
    v = nodes[x]
    print(node_type(v), v)


print("------------")

# self-explanatory
print("##Print all elements after second string/text element##\n")
for m in range(index_texts[1] 1, len(nodes)):
    v = nodes[m]
    print(node_type(v), v)

Output:

##Print HTML child nodes replica as a Python list##

Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="b4ea1739-3f9c-4ff2-8750-8caf7b30aad5")>
String     : Bobby gave:
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="6f45369d-7c92-4cf3-ae64-1d70f8576708")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="4e73eda0-eae0-4915-b8a8-d846e93d5552")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="bf5a0f24-8772-468a-9a68-cdb183ba23bd")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="cb3eb86f-e53d-4089-8a2c-0deb96d0eff2")>
String     : and took
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="992b0798-9738-4b72-a706-56d8b74b0065")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="21fea20b-51fd-4926-bd5e-e78b781f2850")>
------------
##Print all elements between first two string/text element##

Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="6f45369d-7c92-4cf3-ae64-1d70f8576708")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="4e73eda0-eae0-4915-b8a8-d846e93d5552")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="bf5a0f24-8772-468a-9a68-cdb183ba23bd")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="cb3eb86f-e53d-4089-8a2c-0deb96d0eff2")>
------------
##Print all elements after second string/text element##

Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="992b0798-9738-4b72-a706-56d8b74b0065")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="21fea20b-51fd-4926-bd5e-e78b781f2850")>

Now. I gave you nodes list. If you have further complexity in your requirements, let's say you have three text/string values in your HTML and you want IMG or some other type elements between second and third text values, then you should be able to get those elements from that Python list using a little bit of logic.

CodePudding user response:

If you know both text values then you can select images between 'Bobby gave:' and 'and took' with XPath

//div/text()[normalize-space()='Bobby gave:']/following-sibling::img[following-sibling::text()[normalize-space(.)='and took']]

Images after second text node you can select with

//div/text()[normalize-space()='and took']/following-sibling::img
  • Related