Hello I am trying to get a tag that contain a value with non-breaking-space (nbsp) but when I do something like this:
a_url=soup.find_all('a', {"aria-label":"Siguiente "})
a_url
return a empty list
how can I do to get the real value=?
CodePudding user response:
If you are searching for the href
of "next page button" you can get it with these selector
(prerequisite - it is the only one @ website):
soup.select_one('a[aria-label*="Siguiente"]')['href']
Example
from bs4 import BeautifulSoup
soup = BeautifulSoup('''<a href="/jobs?q=ingeniero&start=30&pp=gQAtAAAAAAAAAAAAAAABtmO50gBfAQEBChquI8zsEUMb97LmWiIyJ6B9BupjjNHe0wHVJkxir7vk5faUnfGbH8SIKViz3xGntfsggaFcG0AVf914ketkZJK-TUcyKlIrQmiKVG-Mkh5cMa0vUE4tVGeMixwAAA" aria-label="Siguiente »" data-pp="gQAtAAAAAAAAAAAAAAABtmO50gBfAQEBChquI8zsEUMb97LmWiIyJ6B9BupjjNHe0wHVJkxir7vk5faUnfGbH8SIKViz3xGntfsggaFcG0AVf914ketkZJK-TUcyKlIrQmiKVG-Mkh5cMa0vUE4tVGeMixwAAA" onm ousedown="addPPUrlParam && addPPUrlParam(this);" rel="nofollow"><span class="pn"><span class="np"><svg width="24" height="24" fill="none"><path d="M10 6L8.59 7.41 13.17 12l-4.58 4.59L10 18l6-6-6-6z" fill="#2D2D2D"></path></svg></span></span></a>''', "lxml")
soup.select_one('a[aria-label*="Siguiente"]')['href']