Home > Software engineering >  How do I have nested find_all statements in BeautifulSoup (Python)?
How do I have nested find_all statements in BeautifulSoup (Python)?

Time:10-29

I started off by pulling the page with Selenium and I believe I passed the part of the page I needed to BeautifulSoup correctly using this code:

soup = BeautifulSoup(driver.find_element("xpath", '//*[@id="version_table"]/tbody').get_attribute('outerHTML'))

Now I can parse using BeautifulSoup

query = soup.find_all("tr", class_=lambda x: x != "hidden*")
print (query)

My problem is that I need to dig deeper than just this one query. For example, I would like to nest this one inside of the first (so the first needs to be true, and then this one):

query2 = soup.find_all("tr", id = "version_new_*")
print (query2)

Logically speaking, this is what I'm trying to do (but I get SyntaxError: invalid syntax):

query = soup.find_all(("tr", class_=lambda x: x != "hidden*") and ("tr", id = "version_new_*"))
print (query)

How do I accomplish this?

CodePudding user response:

Regarding: query = soup.find_all(...) and print (query)

find_all is going to return an iterable type. Iterable types can be iterated.

for query in soup.find_all(...): 
    print(query)

CodePudding user response:

You can use a lambda function (along with regex) for every element to do some advanced conditioning

import re

query = soup.find_all(
    lambda tag: 
        tag.name == 'tr' and
        'id' in tag.attrs and re.search('^version_new_*', str(tag.attrs['id'])) and
        'class' in tag.attrs and not re.search('^hidden*', str(tag.attrs['class']))
)
print(list(query))

For every element in the html, we are checking...

  1. If the tag is a table row (tr)
  2. If the tag has an id and if that id matches the pattern
  3. If the tag has a class and if that class matches the pattern
  • Related