Home > OS >  Extracting only single tags in beautifulsoup
Extracting only single tags in beautifulsoup

Time:11-10

I'm looking for a way to extract only tags that don't have another tag in it

For example:

from bs4 import BeautifulSoup
html = """
<p><a href='XYZ'>Text1</a></p>
<p>Text2</p>
<p><a href='QWERTY'>Text3</a></p>
<p>Text4</p>
"""
soup = BeautifulSoup(html, 'html.parser')
soup.find_all('p')

Gives

[<p><a href="XYZ">Text1</a></p>,
 <p>Text2</p>,
 <p><a href="QWERTY">Text3</a></p>,
 <p>Text4</p>]

This is what I want to achieve:

[<p>Text2</p>,
 <p>Text4</p>]

CodePudding user response:

You can filter Tags without other tags in them as follows:

for tag in soup.find_all('p'):
    if isinstance(tag.next, str):
        print(tag)

Which returns

<p>Text2</p>
<p>Text4</p>

CodePudding user response:

I would simply filter it afterwards using if/else on the length of the tags, if it's only p then it'll be empty, otherwise it will get filtered out:

for x in soup.find_all('p'):
    if len([x.tag for x in x.find_all()]) == 0:
        print(x)

Returns only:

<p>Text2</p>
<p>Text4</p>
  • Related