headers in Zillow website - where to get it-CodePudding

The code below extracts data from Zillow Sale.

My 1st question is where people get the headers information.

My 2nd question is how do I know when I needs headers? For some other page like Cars.com, I don't need put headers=headers and I can still get data correctly.

Thank you for your help. HHC

import requests
from bs4 import BeautifulSoup
import re

url ='https://www.zillow.com/baltimore-md-21201/?searchQueryState={"pagination":{},"usersSearchTerm":"21201","mapBounds":{"west":-76.67377295275878,"east":-76.5733510472412,"south":39.26716345016057,"north":39.32309233550334},"regionSelection":[{"regionId":66811,"regionType":7}],"isMapVisible":true,"filterState":{"ah":{"value":true}},"isListVisible":true,"mapZoom":14}'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
  'referer': 'https://www.zillow.com/new-york-ny/rentals/2_p/?searchQueryState={"pagination'
}

raw_page = requests.get(url, headers=headers)
status = raw_page.status_code
print(status)
    
# Loading the page content into the beautiful soup
page = raw_page.content

page_soup = BeautifulSoup(page, 'html.parser')
print(page_soup)

CodePudding user response：

You can get headers from going to the site with your browser and using the network tab of the developer tools in there, select a request and you can headers sent in requests.

Some websites don't serve bots, so to make them think you're not a bot you set the user agent header to one a browser uses, some sites may require more headers for you to pass the not a bot test. You can see all the headers being sent in developer tools, you can test different headers until your request succeeds.