Make all html tags alt values inside selenium Webelement into one string-CodePudding

Is there a robust method for casting all alt values inside the img tags into strings to formulate a sentence, without losing the position of both texts: (Note: the amount of img tags between the two texts can vary)

<img alt="one">
Bobby gave: 
<img alt="two">
<img alt="two">
<img alt="two">
 and took 
<img alt="three">
<img alt="four">

desired output:

sentence = "one Bobby gave: two two two and took three four"

I tried:

images = driver.find_elements(By.TAG_NAME, "img")
sentence = [img.get_attribute("alt") for img in images]

But now i dont get the two texts "Bobby gave" and "and took" between the right tags

CodePudding user response：

I'm no master of Selenium, so here's an alternative using BeautifulSoup which is a tool that goes hand in hand with Selenium:

from bs4 import BeautifulSoup

html = """
<img alt="one">
Bobby gave: 
<img alt="two">
<img alt="two">
<img alt="two">
 and took 
<img alt="three">
<img alt="four">
"""

soup = BeautifulSoup(html, "html.parser")
images = soup.find_all("img")

print(
    " ".join(
        str(img["alt"])
          " "
          img.find_next(text=True).get_text(strip=True, separator=" ")
        for img in images
    )
)

Output:

one Bobby gave: two  two  two and took three  four

CodePudding user response：

Let say HTML looks like (add a parent node to your sample)

<div>
  <img alt="one">
  Bobby gave: 
  <img alt="two">
  <img alt="two">
  <img alt="two">
   and took 
  <img alt="three">
  <img alt="four">
</div>

We can get all parents' node children (images and text nodes)

child_nodes = driver.execute_script("return document.querySelector('div').childNodes;")

And then extract required values

nodes = []
for child in child_nodes:
    try:
        if child['textContent'].strip():
            nodes.append(child['textContent'].strip())
    except:
        nodes.append(child.get_attribute('alt'))
sentence = ' '.join(nodes)

Output:

'one Bobby gave: two two two and took three four'