Home > Software design >  beautifulsoup, how to get text ignoring elements
beautifulsoup, how to get text ignoring elements

Time:09-29

it is possible to filter out only the text from the following structure:

<font>
   <em>X</em>
   and
   <em>Y</em>
</font>

to obtain the following output:

output = "X and Y"

CodePudding user response:

Try:

from bs4 import BeautifulSoup

html_doc = """\
<font>
   <em>X</em>
   and
   <em>Y</em>
</font>"""

soup = BeautifulSoup(html_doc, "html.parser")

out = soup.find("font").get_text(strip=True, separator=" ")
print(out)

Prints:

X and Y
  • Related