Is it possible to test with Beautiful Soup whether a div is a (not necessarily immediate) child of a div?
Eg.
<div class='a'>
<div class='aa'>
<div class='aaa'>
<div class='aaaa'>
</div>
</div>
</div>
<div class='ab'>
<div class='aba'>
<div class='abaa'>
</div>
</div>
</div>
</div>
Now I want to test whether the div
with class aaaa
and the div
with class abaa
are (not necessarily immediate) children of the div with class aa
.
import bs4
with open('test.html','r') as i_file:
soup = bs4.BeautifulSoup(i_file.read(), 'lxml')
div0 = soup.find('div', {'class':'aa'})
div1 = soup.find('div', {'class':'aaaa'})
div2 = soup.find('div', {'class':'abaa'})
print(div1 in div0) # must return True, but returns False
print(div2 in div0) # must return False
How can this be done?
(Of course, the actual HTML is more complicated, with more nested divs.)
CodePudding user response:
try finding all the child elements using find_all_next
and see if the child elements has the required class attribute.
from bs4 import BeautifulSoup
soup = BeautifulSoup(text, "html.parser")
def is_child(element, parent_class, child_class):
return any(
child_class in i.attrs['class']
for i in soup.find("div", attrs={"class": parent_class}).find_all_next(element)
)
print(is_child("div", "aa", "aaa")) # True
print(is_child("div", "abaa", "aa")) # False
CodePudding user response:
You can use find_parent method from Beautifulsoup.
import bs4
with open("test.html", "r") as i_file:
soup = bs4.BeautifulSoup(i_file.read(), "lxml")
div0 = soup.find("div", {"class": "aa"})
div1 = soup.find("div", {"class": "aaaa"})
div2 = soup.find("div", {"class": "abaa"})
print(div1.find_parent(div0.name, attrs=div0.attrs) is not None) # Returns True
print(div2.find_parent(div0.name, attrs=div0.attrs) is not None) # Returns False
CodePudding user response:
Okay, I think I found a way. You gotta get all children divs of the parent div with find_all
:
import bs4
with open('test.html','r') as i_file:
soup = bs4.BeautifulSoup(i_file.read(), 'lxml')
div0 = soup.find('div', {'class':'aa'})
div1 = soup.find('div', {'class':'aaaa'})
div2 = soup.find('div', {'class':'abaa'})
children = div0.find_all('div')
print(div1 in children)
print(div2 in children)