I am trying to create a news aggregator that scrapes headlines from NY Times using BeautifulSoup4.
I want to include the first 15 elements with an h3 tag on the site. However, the 9th element with a h3 tag on NY Times is an advertisement.
How can i inlude that out?
Heres my code:
ht_r = requests.get("https://www.nytimes.com/")
ht_soup = BeautifulSoup(ht_r.content, 'html.parser')
ht_headings = ht_soup.findAll('h3')
ht_headings = ht_headings[0:15]
ht_news = []
I have tried to do
del ht_headings[9]
However, I am getting this error:
SyntaxError: cannot delete function call
CodePudding user response:
you can try:
ht_headings = ht_headings[:9] ht_headings[10:]
CodePudding user response:
Maybe just loop through a list like this?
import requests
from bs4 import BeautifulSoup
ht_r = requests.get("https://www.nytimes.com/")
ht_soup = BeautifulSoup(ht_r.content, 'html.parser')
ht_headings = ht_soup.findAll('h3')
output = []
i = 0
for heading in ht_headings:
if (i != 9 and i < 15):
output.append(heading)
print(output)