Is there a robust method for casting all alt values inside the img tags into strings to formulate a sentence, without losing the position of both texts: (Note: the amount of img tags between the two texts can vary)
<img alt="one">
Bobby gave:
<img alt="two">
<img alt="two">
<img alt="two">
and took
<img alt="three">
<img alt="four">
desired output:
sentence = "one Bobby gave: two two two and took three four"
I tried:
images = driver.find_elements(By.TAG_NAME, "img")
sentence = [img.get_attribute("alt") for img in images]
But now i dont get the two texts "Bobby gave" and "and took" between the right tags
CodePudding user response:
I'm no master of Selenium
, so here's an alternative using BeautifulSoup
which is a tool that goes hand in hand with Selenium
:
from bs4 import BeautifulSoup
html = """
<img alt="one">
Bobby gave:
<img alt="two">
<img alt="two">
<img alt="two">
and took
<img alt="three">
<img alt="four">
"""
soup = BeautifulSoup(html, "html.parser")
images = soup.find_all("img")
print(
" ".join(
str(img["alt"])
" "
img.find_next(text=True).get_text(strip=True, separator=" ")
for img in images
)
)
Output:
one Bobby gave: two two two and took three four
CodePudding user response:
Let say HTML looks like (add a parent node to your sample)
<div>
<img alt="one">
Bobby gave:
<img alt="two">
<img alt="two">
<img alt="two">
and took
<img alt="three">
<img alt="four">
</div>
We can get all parents' node children (images and text nodes)
child_nodes = driver.execute_script("return document.querySelector('div').childNodes;")
And then extract required values
nodes = []
for child in child_nodes:
try:
if child['textContent'].strip():
nodes.append(child['textContent'].strip())
except:
nodes.append(child.get_attribute('alt'))
sentence = ' '.join(nodes)
Output:
'one Bobby gave: two two two and took three four'