Finding text from html using BeautifulSoup-CodePudding

I have the following .html:

<li >
                            <span><em >
                                    <div >1.29 s</div>
                                </em><em >passed</em>This is the text I want to get</span>

I need to get only the text that is outside all of the other tags (text is: This is the text I want to get).

I was trying to use this piece of code:

for el in doc.find_all('li', attrs={'class': 'print text'}):
    print(el.get_text())

But unfortunatelly it prints everything including the em tags etc.

Is there any way to do this?

Thank you!!

CodePudding user response：

Find specific li tag with class and use find_all method on em tag to get the last tag from list using indexing and next-sibling method return text

from bs4 import BeautifulSoup
soup="""<li >
        <span><em >
                <div >1.29 s</div>
            </em><em >passed</em>This is the text I want to get</span>"""

soup=BeautifulSoup(soup)
soup.find("li",class_="print text").find_all("em")[-1].next_sibling

CodePudding user response：

You could go with find(text=True, recursive=False) to get your goal.

Example

from bs4 import BeautifulSoup
soup='''<li >
        <span><em >
                <div >1.29 s</div>
            </em><em >passed</em>This is the text I want to get</span>'''

soup=BeautifulSoup(soup)

soup.find('li',class_='print text').span.find(text=True, recursive=False)

Output

This is the text I want to get

If there are multiple span in your li you could go with:

from bs4 import BeautifulSoup
soup='''<li >
        <span><em >
                <div >1.29 s</div>
            </em><em >passed</em>This is the text I want to get</span>
            <span><em >
                <div >1.50 s</div>
            </em><em >passed</em>This is the text I want to get too</span>'''

soup=BeautifulSoup(soup)

for e in soup.select('li.print.text span'):
    print(e.find(text=True, recursive=False))

Output

This is the text I want to get
This is the text I want to get too