Home > Back-end >  Web Scraping Emails using Python
Web Scraping Emails using Python

Time:08-29

new to web scraping (using python) and encountered a problem trying to get an email from a university's athletic department site.

I've managed to get to navigate to the email I want to extract but don't know where to go from here. When I print what I have, all I get is '' and not the actual text of the email.

I'm attaching what I have so far, let me know if it needs a better explanation.

Here's a link to an image of what I'm trying to scrape. Website and the website: https://goheels.com/staff-directory

Thanks!

Here's my code:

from bs4 import BeautifulSoup
import requests

urls = ''

with open('websites.txt', 'r') as f:
    for line in f.read():
        urls  = line

urls = list(urls.split())

print(urls)

for url in urls:

    res = requests.get(url)
    soup = BeautifulSoup(res.text, 'html.parser')
    try:

        body = soup.find(headers="col-staff_email category-0")
        links = body.a
        print(links)
    except Exception as e:
        print(f'"This url didn\'t work:"  {url}')

CodePudding user response:

The emails are hidden inside a <script> element. With a little pushing, shoving, css selecting and string splitting you can get there:

for em in soup.select('td[headers*="col-staff_email"] script'):
    target = em.text.split('var firstHalf = "')[1]
    fh = target.split('";')[0]
    lh = target.split('var secondHalf = "')[1].split('";')[0]
    print(fh  '@'  lh)

Output:

[email protected]
[email protected]
[email protected]
[email protected]

etc.

  • Related