Home > Net >  Retrieving data from HTML having the child direction using python
Retrieving data from HTML having the child direction using python

Time:05-03

I'm trying to get the email from the city from http://www.comuni-italiani.it/110/index.html

I have the speceific child direction using xPath Finder which is /html/body/span[3]/table[2]/tbody/tr[1]/td[2]/table/tbody/tr[11]/td/b/a. Now I'm trying to retrieve the email from this page but I know very little of BeatifulSoup library (I'm just getting started). After reading several guides I managed to write the following code, but I'm not succesfull with indicating the child route correctly

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('span')
  
for i in child_soup.children:
    print("child :  ", i)

What am I doing wrong??

CodePudding user response:

Please find my attempt to solve your problem below. It starts the same way as in your code, just has a bit of magic to find the email and print it out.

from bs4 import BeautifulSoup
import requests
  
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
page = requests.get(sample_web_page)
soup = BeautifulSoup(page.content, "html.parser")
email = soup.select_one('b > a[href^="mail"]')['href']
print(email.split(':')[1])
  • Related