I'm trying to scrape a table on a page that has classes on each row. There are some classes that signify that the event has yet to take place and I want to avoid these. The table is similar to this:
<tr class="TRow1 TFuture">
<tr class="TRow2 TFuture">
<tr class="TRow1 TFuture">
<tr class="TRow2 TPresent">
<tr class="TRow1 TPast">
<tr class="TRow2">
All I seem to be able to find is how to select a class that I want. Is there any way to select everything except for a class I don't want?
CodePudding user response:
You can use the :not
css selector:
from bs4 import BeautifulSoup as soup
s = """
<tr ></tr>
<tr ></tr>
<tr ></tr>
<tr ></tr>
<tr ></tr>
<tr ></tr>
"""
tr = soup(s, 'html.parser').select('tr:not(.TFuture)')
Output:
[<tr class="TRow2 TPresent"></tr>, <tr class="TRow1 TPast"></tr>, <tr class="TRow2"></tr>]