Selenium not picking the value against the XPATH Python-CodePudding

I have been trying to automate this link to get the email address with selenium. I have used this XPATH //span[@]/a/@href which is perfectly find but selenium doesn't extract the value from there.

I also user Regex but it didn't work as well re.findall(r'mailto:(.*?)\?sub', str(driver.page_source))

Can anyone tell what's the issue here? why it's not getting the emails and how can I extract it?

from selenium import webdriver
from scrapy.selector import Selector
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import re


driver = webdriver.Chrome()
driver.get('https://www.ukparks.com/park/haighfield-park/')

WebDriverWait(driver, 7).until(
    EC.presence_of_element_located((By.XPATH, '//span[@]'))
)

response = Selector(text=driver.page_source)
email = response.xpath('//span[@]/a/@href').get()
email_re = re.findall(r'mailto:(.*?)\?sub', str(driver.page_source))

print(email)
print(email_re)

CodePudding user response：

It seems to populate the data after a click event on the a tag.

wait=WebDriverWait(driver, 10)
driver.get('https://www.ukparks.com/park/haighfield-park/')
wait.until(EC.element_to_be_clickable((By.XPATH, '//span[@]/a'))).click()
link=wait.until(EC.element_to_be_clickable((By.XPATH, '//span[@]/a'))).get_attribute("href")
print(link)

Outputs

mailto:[email protected]?subject=Enquiry from UKParks.com

CodePudding user response：

You can try like the following to fetch email from that site using requests:

import re
import requests
from bs4 import BeautifulSoup

link = 'https://www.ukparks.com/park/haighfield-park/'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    item = soup.select_one(".detail-box").get_text(strip=True)
    email_raw = re.findall(r"ehArr\.push\('(.*?)'\);",item)
    email = ''.join(email_raw[::-1])
    print(email)

Output:

[email protected]