Home > other >  Can't get beautiful soup to work with my callback function and regex
Can't get beautiful soup to work with my callback function and regex

Time:10-28

So I'm trying to use the following code to scrape all the tags from a website where the href attribute matches the pattern /how-to-use/[a-zA-Z]

The code is here:

import requests
from bs4 import BeautifulSoup
import re

webpage = requests.get('https://www.talkenglish.com/vocabulary/top-1500-nouns.aspx').content
soup = BeautifulSoup(webpage, "html.parser")

def has_how_to_use(tag):
    pattern = re.compile('\/how-to-use\/[a-zA-Z] ')
    return bool(re.search(pattern, tag.attr('href')))

word_list = soup.find_all(has_how_to_use)

but I keep getting an error about not being able to call a NoneType object, I'm just not sure which bit is evaluating as a NoneType object

CodePudding user response:

You can pass your regular expression pattern as a keyword argument to find_all() to look for all href's containing your pattern:

soup = BeautifulSoup(webpage, "html.parser")

for tag in soup.find_all("a", href=re.compile(r"/how-to-use/[a-zA-Z] ")):
    print(tag)
  • Related