I'm triying to get the href attributes from a table in this
CodePudding user response:
You can use
for img in soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src']):
print(img.next_sibling.next_sibling['href'])
Notes:
soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src'])
- fetches allimg
nodes that containautomatica.gif
in thesrc
attributeimg.next_sibling.next_sibling['href']
- gets thehref
value of the second sibling of each foundimg
tag.
CodePudding user response:
By preference I would use css selectors for speed, and simply filter on img with src containing automatica
. Then move to the adjacent a
tag, with an adjacent sibling combinator ( ), and extract the href
.
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('http://meteo.navarra.es/estaciones/descargardatos.cfm')
soup = bs(r.content, 'lxml')
automaticas = ['http://meteo.navarra.es/estaciones/' i['href'] for i in soup.select('img[src*=automatica] a')]