I'm using bs4 for getting html tags of web:
html = BeautifulSoup(requests.get(temp_cat_link).text, 'html.parser')
items =html.findAll('h4',{'class':'item-title font-weight-normal '})# this tag have a tag name contain white space at the end
but when I check it not actually get all tag because there're some tag name doens't have white space at the end. It's only return item-title font-weight-normal
tags. So I changed my code in to this:
html = BeautifulSoup(requests.get(temp_cat_link).text, 'html.parser')
items =html.findAll('h4',{'class':'item-title font-weight-normal'})# this tag name doesn't contain white space at the end
But it's only get all tags item-title font-weight-normal
.
The question here is how can I actually get all tag with same string part of name in the html tag
item-title font-weight-normal
and
item-title font-weight-normal
with just a single line of html.findAll
CodePudding user response:
You can use regex to match the string with or without trailing space:
import re
from bs4 import BeautifulSoup
html = BeautifulSoup(requests.get(temp_cat_link).text, 'html.parser')
items = html.findAll('h4',{'class':re.compile(r'item-title font-weight-normal\s*')})