BeautifulSoup finds an html element that contains spaces in its attributes-CodePudding

How to use BeautifulSoup to find an html element that contains spaces in its attributes

<h1 class='td p1'>
    title that i want
</h1>
<h1 class='td p2'>
    title that i don't want
</h1>
<h1 class='p1'>
    title that i don't want
</h1>

I would like to know how to use soup.find to find the title that i want.
Because beautifulsoup considers the attribute attrs of title 'that i want' like this: {'class': ['td', 'p1']}.<br>

But not like this: {'class': ['td p1']}

CodePudding user response：

html = """<h1 class='td p1'>
    title that i want
</h1>
<h1 class='td p2'>
    title that i don't want
</h1>
<h1 class='p2'>
    title that i don't want
</h1>"""

soup = BeautifulSoup(html, "lxml")
content = soup.find('h1', attrs={'class':'td p1'})

output:

>>> print(content)
<h1 >
    title that i want
</h1>

CodePudding user response：

Note Different approaches but both have in common to select the classes explicitly.

find()

soup.find('h1', attrs={'class':'td p1'})

select_one()

soup.select_one('h1.td.p1')

Example

from bs4 import BeautifulSoup
data="""
<h1 class='td p1'>
    title that i want
</h1>
<h1 class='td p2'>
    title that i don't want
</h1>
<h1 class='p1'>
    title that i don't want
</h1>
"""
soup=BeautifulSoup(data,"html.parser")

title = soup.select_one('h1.td.p1')

print(title)

Output

<h1 >
    title that i want
</h1>