I'm using BeautifulSoup, and I need to get the xxx
string from the following line:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p > xxx <br/> yyy <br/></p>',
'html.parser')
Usually, I would do the following:
one_a_tag = soup.p
t = one_a_tag.string
t
But that doesn't work, it returns None
. However, if I delete <br/> yyy <br/>
the code starts working. How do I extract xxx
from the initial line?
CodePudding user response:
Try using .strings
x_and_y = list(soup.p.strings)
print(x_and_y)
Output: [' xxx ', ' yyy ']
.strings
is a generator so the list()
call is needed, but you also can use a for loop
CodePudding user response:
I'm getting the output as follows:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p > xxx <br/> yyy <br/></p>','html.parser')
tag= soup.select_one('p.object-attr-value').text
print(tag.split()[0])
Output:
xxx
CodePudding user response:
The reason why you get None
as output when using .string
is: (from the documentation)
If a tag contains more than one thing, then it’s not clear what
.string
should refer to, so.string
is defined to beNone
So, to get the text xxx
, using your example, you can use .find()
and pass text=True
as an argument:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
'<p > xxx <br/> yyy <br/></p>', 'html.parser'
)
one_a_tag = soup.p
print(one_a_tag.find(text=True).strip())