Home > Software engineering >  Beautifulsoup isn't returning the html that I see when I inpect using the dev tools
Beautifulsoup isn't returning the html that I see when I inpect using the dev tools

Time:11-09

I'm trying to scrape the https://findamortgagebroker.com/ site.

When I use a search url such as "https://findamortgagebroker.com/?search=San Diego&page=2", I don't get the tags that I see when I do the inspection using the dev tools.

I want to scrape for 'a' elements having 'class' equals 'clickable-tile-contact'.

def get_soup(url):
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    time.sleep(10)
    html_page = urlopen(req).read()
    time.sleep(10)
    soup = BeautifulSoup(html_page, 'html.parser')
    return soup

url="https://findamortgagebroker.com/?search=San Diego&page=2"

soup=get_soup(url)

links=soup.find_all('a', attrs={'class':'clickable-tile-contact'})

CodePudding user response:

Actually, Required data is loaded from external source via API as AJAX request as plain HTML tree as post method. So to get the right data you have to apply API url instead.

Full working code as an example:

import requests
from bs4 import BeautifulSoup


api_url ='https://findamortgagebroker.com/home/SearchContacts/'

headers= {
    "content-type":"application/x-www-form-urlencoded",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}


body = "searchModel[SearchText]=San Diego&searchModel[PageNumber]=2&searchModel[Radius]=50&searchModel[ResultsPerPage]=20&searchModel[CaptchaToken]=03AEkXODDG8q9JqC--gCpxJK_Kevp506iB5o5Z7ilzY3Ge6GbYQaoX9jcOJqEyC6TG159L5KSvPoE43UlBxGMYW2jlNcnc0ING0sFeQO2RZIOui0YnNAaByRIVrjaluwaNi7WCE2FykjJNI0B5FNLB7nJjnr9N7YEeUkY13km0wRN3vfyqPh-bVdpahCir00GzE-pQyXU_o84bY1dCWRNQten7O_cnmdcA0ucEPxFeO3WIbMkUkUqqMC5vpAUiz_VttmYMyRETidTuaI6rHE2_AjGbUr6Z61vXFr-dXAC63alA15gGu8ypGRljtHS2wmfNSSySrtegnFxD3txZZ4d2KDk4ugBXLfh3jNUHM_KcKF6Rkp0WOHx7-D-4CEfMf-mC9zJ6FnVqJx3FTZiOrwcelQ0dW1OxdHuHlCVPPQlzIzcFMfsTJOsCLj3JNZTEgkQ6Eicl6dkVV-F-CRPd4fQZ2D_u3dDmrIaCIQJJ4LlQuSYXhLt-6QMcnFXceygadkKGqeiGQZcdUeagF6c8zz9OUg5g2ppXkCu-WsH08e-ei7sRHspA3Rdwh6sylcr8fqFlxDNmEXTI4CH1nRgLvJMuXr6KdcY3AWNhwA&searchModel[IsVendorRequest]=false&searchModel[VendorIdentifier]=0&searchModel[CaptchaV2]=false"
res = requests.post(api_url,data=body,headers=headers)
#print(res)


soup = BeautifulSoup(res.text,'lxml')

data =[]
for item in soup.select('.clickable-tile-contact'):
    data.append({
        'href':item.get('href'),
       
        })
print(data)

Output:

[{'href': 'https://findamortgagebroker.com/Profile\\AndresCamacho26826'}, {'href': 'https://findamortgagebroker.com/Profile\\DavidStein65836'}, {'href': 'https://findamortgagebroker.com/Profile\\DanielRamirez28222'}, {'href': 'https://findamortgagebroker.com/Profile\\DavidHolland56665'}, {'href': 'https://findamortgagebroker.com/Profile\\EvbeniiMalenko57387'}, {'href': 'https://findamortgagebroker.com/Profile\\AmirNurani66326'}, {'href': 'https://findamortgagebroker.com/Profile\\MarialuisaSarrizLira37868'}, {'href': 'https://findamortgagebroker.com/Profile\\DejaCorreia53368'}, {'href': 'https://findamortgagebroker.com/Profile\\JulioRugama72662'}, {'href': 'https://findamortgagebroker.com/Profile\\MarthaMunoz26537'}, {'href': 'https://findamortgagebroker.com/Profile\\CarlosMunoz55258'}, {'href': 'https://findamortgagebroker.com/Profile\\AndreaCutuk35775'}, {'href': 'https://findamortgagebroker.com/Profile\\LauraPardo64458'}, {'href': 'https://findamortgagebroker.com/Profile\\KatiePike37454'}, {'href': 'https://findamortgagebroker.com/Profile\\JustinGuthrie27854'}, {'href': 'https://findamortgagebroker.com/Profile\\GinoSalvaggio54863'}, {'href': 'https://findamortgagebroker.com/Profile\\AnnaValencia55287'}, {'href': 'https://findamortgagebroker.com/Profile\\ArtinMousakhan27554'}, {'href': 'https://findamortgagebroker.com/Profile\\GloriaPereira45832'}, {'href': 'https://findamortgagebroker.com/Profile\\NickKinnard38652'}]
  • Related