Home > Back-end >  Can't get all html page with Beautiful soup
Can't get all html page with Beautiful soup

Time:03-13

I'm trying to get the content of this webpage : https://www.zillow.com/homes/for_rent/1-_beds/?searchQueryState={"pagination":{},"mapBounds":{"west":-122.67022170019531,"east":-122.19643629980469,"south":37.615282466144976,"north":37.93495488175342},"mapZoom":11,"isMapVisible":true,"filterState":{"price":{"max":872627},"beds":{"min":1},"fore":{"value":false},"mp":{"max":3000},"nc":{"value":false},"fr":{"value":true},"cmsn":{"value":false},"fsba":{"value":false}},"isListVisible":true}

I can't get all of it. Many elements are empty. I was told that it was the case because it was js code and bs4 can't read js and I had to use selenium instead, but I want to do it with bs4 and I know there is a way to do so. I also was told that it was the case, because I wasn't in the correct iframe, but I doesn't seem to be true. For example if you inspect one of the prices listed (e.g $2,200/mo) you will see that it is contained in a ul list and each apartment listed is a li element of that list. But when I scrape the page with bs it seems that most of these li elements are empty. Also, bear in mind I'm a newbie in web-scraping and in python, so be cool please. Thanks!

Here is the code I'm using to get the page html:

self.response = requests.get(url=URL, headers=headers)
self.html_doc = self.response.text
self.soup = BeautifulSoup(self.html_doc, 'html.parser')

CodePudding user response:

Yes, this site use react. Check browser developer tool NETWORK on chorme or firefox and look how files and request make you browser. Check callstack and more request details pointing on data. I see on dt network this link https://www.zillow.com/search/GetSearchPageState.htm?searchQueryState={"pagination":{},"mapBounds":{"west":-122.83501662207031,"east":-122.03164137792969,"south":37.548623602126355,"north":38.00126648128239},"mapZoom":11,"isMapVisible":true,"category":"cat2","filterState":{"price":{"max":872627},"beds":{"min":1},"isForSaleForeclosure":{"value":false},"monthlyPayment":{"max":3000},"isNewConstruction":{"value":false},"isComingSoon":{"value":false},"isForSaleByAgent":{"value":false},"sortSelection":{"value":"globalrelevanceex"}},"isListVisible":true}&wants={"cat2":["listResults","mapResults"],"cat1":["total"]}&requestId=6. React builds the site page based on this data. Sry my english not good but i hope i helped.

CodePudding user response:

It appears that I have to use selenium in order to do the work. Thanks all for you participation!

  • Related