I started off by pulling the page with Selenium and I believe I passed the part of the page I needed to BeautifulSoup correctly using this code:
soup = BeautifulSoup(driver.find_element("xpath", '//*[@id="version_table"]/tbody').get_attribute('outerHTML'))
Now I can parse using BeautifulSoup
query = soup.find_all("tr", class_=lambda x: x != "hidden*")
print (query)
My problem is that I need to dig deeper than just this one query. For example, I would like to nest this one inside of the first (so the first needs to be true, and then this one):
query2 = soup.find_all("tr", id = "version_new_*")
print (query2)
Logically speaking, this is what I'm trying to do (but I get SyntaxError: invalid syntax):
query = soup.find_all(("tr", class_=lambda x: x != "hidden*") and ("tr", id = "version_new_*"))
print (query)
How do I accomplish this?
CodePudding user response:
Regarding: query = soup.find_all(...) and print (query)
find_all
is going to return an iterable type. Iterable types can be iterated.
for query in soup.find_all(...):
print(query)
CodePudding user response:
You can use a lambda function (along with regex) for every element to do some advanced conditioning
import re
query = soup.find_all(
lambda tag:
tag.name == 'tr' and
'id' in tag.attrs and re.search('^version_new_*', str(tag.attrs['id'])) and
'class' in tag.attrs and not re.search('^hidden*', str(tag.attrs['class']))
)
print(list(query))
For every element in the html, we are checking...
- If the tag is a table row (tr)
- If the tag has an id and if that id matches the pattern
- If the tag has a class and if that class matches the pattern