Home > Software engineering >  BS4 findAll html tags with same part of tag name
BS4 findAll html tags with same part of tag name

Time:11-16

I'm using bs4 for getting html tags of web:

html = BeautifulSoup(requests.get(temp_cat_link).text, 'html.parser')
items =html.findAll('h4',{'class':'item-title font-weight-normal '})# this tag have a tag name contain white space at the end

but when I check it not actually get all tag because there're some tag name doens't have white space at the end. It's only return item-title font-weight-normal tags. So I changed my code in to this:

html = BeautifulSoup(requests.get(temp_cat_link).text, 'html.parser')
items =html.findAll('h4',{'class':'item-title font-weight-normal'})# this tag name doesn't contain white space at the end

But it's only get all tags item-title font-weight-normal. The question here is how can I actually get all tag with same string part of name in the html tag

item-title font-weight-normal and item-title font-weight-normal with just a single line of html.findAll

CodePudding user response:

You can use regex to match the string with or without trailing space:

import re
from bs4 import BeautifulSoup

html = BeautifulSoup(requests.get(temp_cat_link).text, 'html.parser')
items = html.findAll('h4',{'class':re.compile(r'item-title font-weight-normal\s*')})
  • Related