I'm trying to scrape the https://findamortgagebroker.com/ site.
When I use a search url such as "https://findamortgagebroker.com/?search=San Diego&page=2", I don't get the tags that I see when I do the inspection using the dev tools.
I want to scrape for 'a' elements having 'class' equals 'clickable-tile-contact'.
def get_soup(url):
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
time.sleep(10)
html_page = urlopen(req).read()
time.sleep(10)
soup = BeautifulSoup(html_page, 'html.parser')
return soup
url="https://findamortgagebroker.com/?search=San Diego&page=2"
soup=get_soup(url)
links=soup.find_all('a', attrs={'class':'clickable-tile-contact'})
CodePudding user response:
Actually, Required data is loaded from external source via API
as AJAX request as plain HTML tree as post method. So to get the right data you have to apply API url instead.
Full working code as an example:
import requests
from bs4 import BeautifulSoup
api_url ='https://findamortgagebroker.com/home/SearchContacts/'
headers= {
"content-type":"application/x-www-form-urlencoded",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}
body = "searchModel[SearchText]=San Diego&searchModel[PageNumber]=2&searchModel[Radius]=50&searchModel[ResultsPerPage]=20&searchModel[CaptchaToken]=03AEkXODDG8q9JqC--gCpxJK_Kevp506iB5o5Z7ilzY3Ge6GbYQaoX9jcOJqEyC6TG159L5KSvPoE43UlBxGMYW2jlNcnc0ING0sFeQO2RZIOui0YnNAaByRIVrjaluwaNi7WCE2FykjJNI0B5FNLB7nJjnr9N7YEeUkY13km0wRN3vfyqPh-bVdpahCir00GzE-pQyXU_o84bY1dCWRNQten7O_cnmdcA0ucEPxFeO3WIbMkUkUqqMC5vpAUiz_VttmYMyRETidTuaI6rHE2_AjGbUr6Z61vXFr-dXAC63alA15gGu8ypGRljtHS2wmfNSSySrtegnFxD3txZZ4d2KDk4ugBXLfh3jNUHM_KcKF6Rkp0WOHx7-D-4CEfMf-mC9zJ6FnVqJx3FTZiOrwcelQ0dW1OxdHuHlCVPPQlzIzcFMfsTJOsCLj3JNZTEgkQ6Eicl6dkVV-F-CRPd4fQZ2D_u3dDmrIaCIQJJ4LlQuSYXhLt-6QMcnFXceygadkKGqeiGQZcdUeagF6c8zz9OUg5g2ppXkCu-WsH08e-ei7sRHspA3Rdwh6sylcr8fqFlxDNmEXTI4CH1nRgLvJMuXr6KdcY3AWNhwA&searchModel[IsVendorRequest]=false&searchModel[VendorIdentifier]=0&searchModel[CaptchaV2]=false"
res = requests.post(api_url,data=body,headers=headers)
#print(res)
soup = BeautifulSoup(res.text,'lxml')
data =[]
for item in soup.select('.clickable-tile-contact'):
data.append({
'href':item.get('href'),
})
print(data)
Output:
[{'href': 'https://findamortgagebroker.com/Profile\\AndresCamacho26826'}, {'href': 'https://findamortgagebroker.com/Profile\\DavidStein65836'}, {'href': 'https://findamortgagebroker.com/Profile\\DanielRamirez28222'}, {'href': 'https://findamortgagebroker.com/Profile\\DavidHolland56665'}, {'href': 'https://findamortgagebroker.com/Profile\\EvbeniiMalenko57387'}, {'href': 'https://findamortgagebroker.com/Profile\\AmirNurani66326'}, {'href': 'https://findamortgagebroker.com/Profile\\MarialuisaSarrizLira37868'}, {'href': 'https://findamortgagebroker.com/Profile\\DejaCorreia53368'}, {'href': 'https://findamortgagebroker.com/Profile\\JulioRugama72662'}, {'href': 'https://findamortgagebroker.com/Profile\\MarthaMunoz26537'}, {'href': 'https://findamortgagebroker.com/Profile\\CarlosMunoz55258'}, {'href': 'https://findamortgagebroker.com/Profile\\AndreaCutuk35775'}, {'href': 'https://findamortgagebroker.com/Profile\\LauraPardo64458'}, {'href': 'https://findamortgagebroker.com/Profile\\KatiePike37454'}, {'href': 'https://findamortgagebroker.com/Profile\\JustinGuthrie27854'}, {'href': 'https://findamortgagebroker.com/Profile\\GinoSalvaggio54863'}, {'href': 'https://findamortgagebroker.com/Profile\\AnnaValencia55287'}, {'href': 'https://findamortgagebroker.com/Profile\\ArtinMousakhan27554'}, {'href': 'https://findamortgagebroker.com/Profile\\GloriaPereira45832'}, {'href': 'https://findamortgagebroker.com/Profile\\NickKinnard38652'}]