bs4 remove None tag after decompose-CodePudding

I want to delete advertisment text from scraped data but after i decompose it i get error saying

list index out of range

I think its becouse after decompose is blank space or somthing. Without decompose loop works ok.

import requests
from bs4 import BeautifulSoup

url = 'https://www.marketbeat.com/insider-trades/ceo-share-buys-and-sales/'

companyName = 'title-area'

r = requests.get(url)

soup = BeautifulSoup(r.content, 'html.parser')

table = soup.find_all('table')[0].tbody.find_all('tr')

#  delete advertisment
soup.find("tr", class_="bottom-sort").decompose()

for el in table:
    print(el.find_all('td')[0].text)

CodePudding user response：

You can use tag.extract() to delete the tag. Also, delete the tag before you find all <tr> tags:

import requests
from bs4 import BeautifulSoup

url = "https://www.marketbeat.com/insider-trades/ceo-share-buys-and-sales/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

#  delete advertisment
for tr in soup.select("tr.bottom-sort"):
    tr.extract()

table = soup.find_all("table")[0].tbody.find_all("tr")

for el in table:
    print(el.find_all("td")[0].text)

Prints:


...
TZOOTravelzoo
NEOGNeogen Co.
RKTRocket Companies, Inc.
FINWFinWise Bancorp
WMPNWilliam Penn Bancorporation

CodePudding user response：

There is nothing wrong using decompose() you only have to pay attention to the order in your process:

#  first delete advertisment
soup.find("tr", class_="bottom-sort").decompose()

# then select the table rows
table = soup.find_all('table')[0].tbody.find_all('tr')