import requests
import lxml
from bs4 import BeautifulSoup
LISTINGS_URL = 'https://shorturl.at/ceoAB'
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/95.0.4638.69 Safari/537.36 ",
"Accept-Language": "en-US,en;q=0.9"
}
response = requests.get(LISTINGS_URL, headers=headers)
listings = response.text
class DataScraper:
def __init__(self):
self.soup = BeautifulSoup(listings, "html.parser")
def get_links(self):
for a in self.soup.select(".list-card-top a"):
print(a)
# listing_text = [link.getText() for link in links]
def get_address(self):
pass
def get_prices(self):
pass
I Have Used the correct css selectors, even trying to find the elements using attrs in find_all() What I am trying to achieve is to parse in all the anchor tags then to fetch the href links for the specific listings however it is only returning the first 10
CodePudding user response:
You can make a GET request to this endpoint and fetch the data you need.
https://www.zillow.com/search/GetSearchPageState.htm?searchQueryState={"pagination":{"currentPage":1},"mapBounds":{"west":-123.33522421253342,"east":-121.44008261097092,"south":37.041584214606814,"north":38.39290664366326},"isMapVisible":false,"filterState":{"price":{"max":872627},"beds":{"min":1},"isForSaleForeclosure":{"value":false},"monthlyPayment":{"max":3000},"isAuction":{"value":false},"isNewConstruction":{"value":false},"isForRent":{"value":true},"isForSaleByOwner":{"value":false},"isComingSoon":{"value":false},"isForSaleByAgent":{"value":false}},"isListVisible":true,"mapZoom":9}&wants={"cat1":["listResults"]}
Change the "currentPage"
url parameter value in the above URL to fetch data from different pages.
Since the response is JSON
, you can easily parse it and extract the information using json
module.
CodePudding user response:
Website is using probably lazy loading, so you can either use something like selenium/puppeteer or use an API of this website (will be an easier way). To do this you need to make a GET
request to an url which starts with https://www.zillow.com/search/GetSearchPageState.htm
(see in your dev tools in browser), parse JSON response and you have your href
link under cat1.searchResults.listResults[index in array].detailUrl
.