Home > Back-end >  Python scraping email protection address from href link
Python scraping email protection address from href link

Time:07-01

I want to get email adresses from : [1]: https://thenationalweddingdirectory.com.au/suppliers/wedding-venues/queensland/the-dock-mooloolaba-events/

right now I have the code, but how can i scrap the email address from the clicked link?

from requests_html import HTMLSession    

url = 'https://thenationalweddingdirectory.com.au/explore/?category=wedding-venues&region=melbourne&sort=top-rated'

s = HTMLSession()
r = s.get(url)

r.html.render(sleep=1)
products = r.html.xpath('//*[@id="finderListings"]/div[2]', first=True)

for item in products.absolute_links:
r = s.get(item)
print(r.html.find('li.lmb-calltoaction a', first=True))

CodePudding user response:

Email, telephone is on the page, there are one json with all info you need.
Also you have some "ajax" request to get all URLs to visit.

import json
from bs4 import BeautifulSoup
import requests
import re

params = {
    'mylisting-ajax': '1',
    'action': 'get_listings',
    'form_data[page]': '0',
    'form_data[preserve_page]': 'false',
    'form_data[category]': 'wedding-venues',
    'form_data[region]': 'melbourne',
    'form_data[sort]': 'top-rated',
    'listing_type': 'place',
}

response = requests.get('https://thenationalweddingdirectory.com.au/', params=params)
# get all urls
results = re.findall("https://thenationalweddingdirectory.com.au/suppliers/wedding-venues/melbourne/[a-zA-Z-]*/",
                     response.text.replace("\\", ""))
headers = {
    'accept': '*/*',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8,es;q=0.7,ru;q=0.6',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}
for result in results:
    print("Navigate: "   result)
    response = requests.get(result, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
    scripts = soup.find_all("script")
    for script in scripts:
        if "LocalBusiness" in script.text:
            data = json.loads(script.text)
            print("Name: "   data["name"])
            print("Telephone: "   data["telephone"])
            print("Email: "   data["email"])
            break

OUTPUT:

Navigate: https://thenationalweddingdirectory.com.au/suppliers/wedding-venues/melbourne/metropolis-events/
Name: Metropolis Events
Telephone: 03 8537 7300
Email: [email protected]
Navigate: https://thenationalweddingdirectory.com.au/suppliers/wedding-venues/melbourne/cotham-dining/
Name: Cotham Dining
Telephone: 0411 931 818
Email: [email protected]
  • Related