I want to find all of td
s that don't have a custom html attribute data-stat="randomValue"
My data looks something like this:
<td data-stat="foo">10</td>
<td data-stat="bar">20</td>
<td data-stat="test">30</td>
<td data-stat="DUMMY"> </td>
I know that I can just select for foo, bar, and test
but my actual dataset will have hunders of different values for data-set
so it just wouldn't be feasible to code.
Is there something like a !=
operator that I can use in beautiful soup? I tried doing:
[td.getText() for td in rows[i].findAll('td:not([data-stat="DUMMY"])')]
but I only get []
as a value.
CodePudding user response:
You can use list comprehension to filter out the unvanted tags, for example:
print([td.text for td in soup.find_all("td") if td.get("data-stat") != "DUMMY"])
Or use CSS selector with .select
(as @Barmar said in comments, .find_all
doesn't accept CSS selectors):
print([td.text for td in soup.select('td:not([data-stat="DUMMY"])')])