Home > Blockchain >  Beautiful Soup: Test if a div is children of a div
Beautiful Soup: Test if a div is children of a div

Time:12-29

Is it possible to test with Beautiful Soup whether a div is a (not necessarily immediate) child of a div?

Eg.

<div class='a'>
  <div class='aa'>
    <div class='aaa'>
      <div class='aaaa'>
      </div>
    </div>
  </div>
  <div class='ab'>
    <div class='aba'>
      <div class='abaa'>
      </div>
    </div>
  </div>
</div>

Now I want to test whether the div with class aaaa and the div with class abaa are (not necessarily immediate) children of the div with class aa.

import bs4

with open('test.html','r') as i_file:
  soup = bs4.BeautifulSoup(i_file.read(), 'lxml')
div0 = soup.find('div', {'class':'aa'})
div1 = soup.find('div', {'class':'aaaa'})
div2 = soup.find('div', {'class':'abaa'})

print(div1 in div0)  # must return True, but returns False
print(div2 in div0)  # must return False

How can this be done?

(Of course, the actual HTML is more complicated, with more nested divs.)

CodePudding user response:

try finding all the child elements using find_all_next and see if the child elements has the required class attribute.

from bs4 import BeautifulSoup

soup = BeautifulSoup(text, "html.parser")


def is_child(element, parent_class, child_class):
    return any(
        child_class in i.attrs['class']
        for i in soup.find("div", attrs={"class": parent_class}).find_all_next(element)
    )


print(is_child("div", "aa", "aaa"))  # True
print(is_child("div", "abaa", "aa"))  # False

CodePudding user response:

You can use find_parent method from Beautifulsoup.

import bs4

with open("test.html", "r") as i_file:
    soup = bs4.BeautifulSoup(i_file.read(), "lxml")

div0 = soup.find("div", {"class": "aa"})
div1 = soup.find("div", {"class": "aaaa"})
div2 = soup.find("div", {"class": "abaa"})


print(div1.find_parent(div0.name, attrs=div0.attrs) is not None)  # Returns True
print(div2.find_parent(div0.name, attrs=div0.attrs) is not None)  # Returns False

CodePudding user response:

Okay, I think I found a way. You gotta get all children divs of the parent div with find_all:

import bs4

with open('test.html','r') as i_file:
  soup = bs4.BeautifulSoup(i_file.read(), 'lxml')

div0 = soup.find('div', {'class':'aa'})
div1 = soup.find('div', {'class':'aaaa'})
div2 = soup.find('div', {'class':'abaa'})

children = div0.find_all('div')
print(div1 in children)
print(div2 in children)
  • Related