How to select HTML elements that do not have a specific attribute value?-CodePudding

Basically title, my html looks like this:

<th data-stat='foo'> 10 </th>
<th data-stat='bar'> 20 </th>
<th data-stat='DUMMY'>  </th>

and I tried using

x = [td.getText() for td in rows[i].findAll('td') and not rows[i].findAll(attrs={"data-stat":"DUMMY"})]

but that did not work obviously. My desired output would only get the text from data-stat="foo" and data-stat="bar", which would look like:

x=["10","20"]

CodePudding user response：

You can find easily on the documentation

from bs4 import BeautifulSoup

table = """"<th data-stat='foo'> 10 </th>
<th data-stat='bar'> 20 </th>
<th data-stat='DUMMY'>  </th>"""
soup = BeautifulSoup(table, "lxml")
value_list = []


value_list.append(soup.find("th", {"data-stat": "foo"}).text.strip())
value_list.append(soup.find("th", {"data-stat": "bar"}).text.strip())
print(value_list)

CodePudding user response：

Use an css selector with pseudo-class :not() to select your elements:

soup.select('th:not([data-stat="DUMMY"])')

Note: In your question you try to find td while there is only th in your example.

Example

from bs4 import BeautifulSoup
html ='''
<th data-stat='foo'> 10 </th>
<th data-stat='bar'> 20 </th>
<th data-stat='DUMMY'>  </th>
'''
soup = BeautifulSoup(html)

[e.get_text(strip=True) for e in soup.select('th:not([data-stat="DUMMY"])')]

Output

['10', '20']