I am trying to convert the HTML String Tag into String using Python.
Here is the content I'm trying to convert:
htmltxt = "<b>Hello World</b>".
The result should appear like Hello World in bold. But I'm getting like
<html><body><b>Hello World</b></body></html>
with the below snippet of code
from bs4 import BeautifulSoup
htmltxt = "<b>Hello World</b>"
soup = BeautifulSoup(htmltxt, 'lxml')
Can anyone suggest me how to convert?
CodePudding user response:
In this situation you're trying to find a tag from within your soup object. Given this is the only one and there is no id or class name you can use:
hello_world_tag = soup.find("b")
hello_world_tag_text = hello_world_tag.text
print(hello_world_tag_text) # Output: 'Hello World'
The key here is '.text'. Using beautiful soup to find a specific tag will return that entire tag, but the .text method returns just the text from within that tag.
Edit following comment:
I would still recommend using bs4 to parse html. Once you have your text if you'd like it in bold you may print with:
print('\033[1m' text)
CodePudding user response:
Note You won't get out a bold
string per se, it is something that always have to be done by interpreting or formating.
Extracting text from HTML string with BeautifulSoup you can call the methods text
or get_text()
:
from bs4 import BeautifulSoup
htmltxt = "<b>Hello World</b>"
soup = BeautifulSoup(htmltxt, 'lxml')
soup.text