I'm using BeautifulSoup to scrape an html page where the information I need are stored in a code like this:
<a href="site0.html"> Title 0 </a>
<a href="site1.html"> Title 1 </a>
<a href="site2.html"> Title 2 </a>
[...]
I'd like to get "Title 0", "Title 1" and "Title 2" but the class name change for each item, so I'm using regex like this:
titles = soup.findAll("a", attrs={"class": re.compile('^TitleonContext.*')})
for title in titles:
print(title)
But it's not working (nothing is printed). What am I doing wrong?
CodePudding user response:
Try using the following regex instead re.compile(r'.*TitleonContext')
or re.compile('.*TitleonContext')
, otherwise you're looking for this value to be started with (^
).