So I'm trying to use the following code to scrape all the tags from a website where the href attribute matches the pattern /how-to-use/[a-zA-Z]
The code is here:
import requests
from bs4 import BeautifulSoup
import re
webpage = requests.get('https://www.talkenglish.com/vocabulary/top-1500-nouns.aspx').content
soup = BeautifulSoup(webpage, "html.parser")
def has_how_to_use(tag):
pattern = re.compile('\/how-to-use\/[a-zA-Z] ')
return bool(re.search(pattern, tag.attr('href')))
word_list = soup.find_all(has_how_to_use)
but I keep getting an error about not being able to call a NoneType object, I'm just not sure which bit is evaluating as a NoneType object
CodePudding user response:
You can pass your regular expression pattern as a keyword argument to find_all()
to look for all href
's containing your pattern:
soup = BeautifulSoup(webpage, "html.parser")
for tag in soup.find_all("a", href=re.compile(r"/how-to-use/[a-zA-Z] ")):
print(tag)