Home > Mobile >  Unable to scrape an email address from a webpage using requests module
Unable to scrape an email address from a webpage using requests module

Time:09-23

I'm trying to scrape an email address from this webpage using requests module, not selenium. Although the email address is obfuscated and not present in page source, a javascript function generates this. How can I make use of the following portion to get the email address visible in that webpage?

document.write("\u003cn uers=\"znvygb:[email protected]\"\[email protected]\u003c/n\u003e".replace(/[a-zA-Z]/g, function(c){return String.fromCharCode((c<="Z"?90:122)>=(c=c.charCodeAt(0) 13)?c:c-26);}));

I've tried so far with:

import requests
from bs4 import BeautifulSoup

link = 'https://www.californiatoplawyers.com/lawyer/311805/tobyn-yael-aaron'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,"html.parser")
email = soup.select_one("dt:-soup-contains('Email')   dd")
print(email)

Expected output:

[email protected]

CodePudding user response:

For these tasks I recommend js2py module:

import js2py
import requests
from bs4 import BeautifulSoup

link = "https://www.californiatoplawyers.com/lawyer/311805/tobyn-yael-aaron"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
res = requests.get(link, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
email = soup.select_one("dt:-soup-contains('Email')   dd")

js_code = email.script.contents[0].replace("document.write", "")
email = BeautifulSoup(js2py.eval_js(js_code), "html.parser").text
print(email)

Prints:

[email protected]
  • Related