How to use BeautifulSoup
to find an html element that contains spaces in its attributes
<h1 class='td p1'>
title that i want
</h1>
<h1 class='td p2'>
title that i don't want
</h1>
<h1 class='p1'>
title that i don't want
</h1>
I would like to know how to use soup.find
to find the title that i want
.
Because beautifulsoup
considers the attribute attrs of title 'that i want'
like this: {'class': ['td', 'p1']}.<br>
But not like this: {'class': ['td p1']}
CodePudding user response:
html = """<h1 class='td p1'>
title that i want
</h1>
<h1 class='td p2'>
title that i don't want
</h1>
<h1 class='p2'>
title that i don't want
</h1>"""
soup = BeautifulSoup(html, "lxml")
content = soup.find('h1', attrs={'class':'td p1'})
output:
>>> print(content)
<h1 >
title that i want
</h1>
CodePudding user response:
Note Different approaches but both have in common to select the classes explicitly.
find()
soup.find('h1', attrs={'class':'td p1'})
select_one()
soup.select_one('h1.td.p1')
Example
from bs4 import BeautifulSoup
data="""
<h1 class='td p1'>
title that i want
</h1>
<h1 class='td p2'>
title that i don't want
</h1>
<h1 class='p1'>
title that i don't want
</h1>
"""
soup=BeautifulSoup(data,"html.parser")
title = soup.select_one('h1.td.p1')
print(title)
Output
<h1 >
title that i want
</h1>