I'm looking for a way to extract only tags that don't have another tag in it
For example:
from bs4 import BeautifulSoup
html = """
<p><a href='XYZ'>Text1</a></p>
<p>Text2</p>
<p><a href='QWERTY'>Text3</a></p>
<p>Text4</p>
"""
soup = BeautifulSoup(html, 'html.parser')
soup.find_all('p')
Gives
[<p><a href="XYZ">Text1</a></p>,
<p>Text2</p>,
<p><a href="QWERTY">Text3</a></p>,
<p>Text4</p>]
This is what I want to achieve:
[<p>Text2</p>,
<p>Text4</p>]
CodePudding user response:
You can filter Tag
s without other tags in them as follows:
for tag in soup.find_all('p'):
if isinstance(tag.next, str):
print(tag)
Which returns
<p>Text2</p>
<p>Text4</p>
CodePudding user response:
I would simply filter it afterwards using if/else
on the length of the tags, if it's only p
then it'll be empty, otherwise it will get filtered out:
for x in soup.find_all('p'):
if len([x.tag for x in x.find_all()]) == 0:
print(x)
Returns only:
<p>Text2</p>
<p>Text4</p>