Using beatifulsoup to find text on html-CodePudding

This is my first time using beautifulsoup as a scraper tool and I just follow thru slowly with each step.

I've used soup.find_all("div", class_="product-box__inner") find a list of element I want and this partiful stuff not going thru my mind right now. my question below,

here is the HTML and my target is "$0" and I have tried element.find("span", title= re.compile("$")) and I can't use element.select("dt > dd > span > span") because there's multiple one with same tag format which I dont need at all, Is there way I can target span data-fees-annual-value="" to get .text working?

<div >
    <dt >Annual fee</dt>
    <dd >
        <span>
            <span data-fees-annual-value="">$0</span>
        </span>
    </dd>
</div>

CodePudding user response：

If you want to find element by text, use string instead of title:

element.find("span", string=re.compile('$'))

Output:

<span data-fees-annual-value="">$0</span>

CodePudding user response：

You are close to your goal with css selectors and they could be used more specific and reference directly on the attribute data-fees-annual-value:

soup.select_one('span[data-fees-annual-value]').text

Example

from bs4 import BeautifulSoup

html="""
<div >
    <dt >Annual fee</dt>
    <dd >
        <span>
            <span data-fees-annual-value="">$0</span>
        </span>
    </dd>
</div>
"""
soup=BeautifulSoup(html,"html.parser")

soup.select_one('span[data-fees-annual-value]').text

Output

$0