Find only certain elements in table with Beatiful soup-CodePudding

I'm triying to get the href attributes from a table in this

CodePudding user response：

You can use

for img in soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src']):
    print(img.next_sibling.next_sibling['href'])

Notes:

soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src']) - fetches all img nodes that contain automatica.gif in the src attribute
img.next_sibling.next_sibling['href'] - gets the href value of the second sibling of each found img tag.

CodePudding user response：

By preference I would use css selectors for speed, and simply filter on img with src containing automatica. Then move to the adjacent a tag, with an adjacent sibling combinator ( ), and extract the href.

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('http://meteo.navarra.es/estaciones/descargardatos.cfm')
soup = bs(r.content, 'lxml')
automaticas = ['http://meteo.navarra.es/estaciones/'   i['href'] for i in soup.select('img[src*=automatica]   a')]