Home > Back-end >  bs4 remove None tag after decompose
bs4 remove None tag after decompose

Time:08-24

I want to delete advertisment text from scraped data but after i decompose it i get error saying

list index out of range

I think its becouse after decompose is blank space or somthing. Without decompose loop works ok.

import requests
from bs4 import BeautifulSoup

url = 'https://www.marketbeat.com/insider-trades/ceo-share-buys-and-sales/'

companyName = 'title-area'

r = requests.get(url)

soup = BeautifulSoup(r.content, 'html.parser')

table = soup.find_all('table')[0].tbody.find_all('tr')

#  delete advertisment
soup.find("tr", class_="bottom-sort").decompose()

for el in table:
    print(el.find_all('td')[0].text)

CodePudding user response:

You can use tag.extract() to delete the tag. Also, delete the tag before you find all <tr> tags:

import requests
from bs4 import BeautifulSoup

url = "https://www.marketbeat.com/insider-trades/ceo-share-buys-and-sales/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

#  delete advertisment
for tr in soup.select("tr.bottom-sort"):
    tr.extract()

table = soup.find_all("table")[0].tbody.find_all("tr")

for el in table:
    print(el.find_all("td")[0].text)

Prints:


...
TZOOTravelzoo
NEOGNeogen Co.
RKTRocket Companies, Inc.
FINWFinWise Bancorp
WMPNWilliam Penn Bancorporation

CodePudding user response:

There is nothing wrong using decompose() you only have to pay attention to the order in your process:

#  first delete advertisment
soup.find("tr", class_="bottom-sort").decompose()

# then select the table rows
table = soup.find_all('table')[0].tbody.find_all('tr')
  • Related