I want to delete advertisment text from scraped data but after i decompose it i get error saying
list index out of range
I think its becouse after decompose is blank space or somthing. Without decompose loop works ok.
import requests
from bs4 import BeautifulSoup
url = 'https://www.marketbeat.com/insider-trades/ceo-share-buys-and-sales/'
companyName = 'title-area'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
table = soup.find_all('table')[0].tbody.find_all('tr')
# delete advertisment
soup.find("tr", class_="bottom-sort").decompose()
for el in table:
print(el.find_all('td')[0].text)
CodePudding user response:
You can use tag.extract()
to delete the tag. Also, delete the tag before you find all <tr>
tags:
import requests
from bs4 import BeautifulSoup
url = "https://www.marketbeat.com/insider-trades/ceo-share-buys-and-sales/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
# delete advertisment
for tr in soup.select("tr.bottom-sort"):
tr.extract()
table = soup.find_all("table")[0].tbody.find_all("tr")
for el in table:
print(el.find_all("td")[0].text)
Prints:
...
TZOOTravelzoo
NEOGNeogen Co.
RKTRocket Companies, Inc.
FINWFinWise Bancorp
WMPNWilliam Penn Bancorporation
CodePudding user response:
There is nothing wrong using decompose()
you only have to pay attention to the order in your process:
# first delete advertisment
soup.find("tr", class_="bottom-sort").decompose()
# then select the table rows
table = soup.find_all('table')[0].tbody.find_all('tr')