I'm a newbie learning Python. While using BeautifulSoup and Requests to scrap "https://batdongsan.com.vn/nha-dat-ban-tp-hcm" for collect data on housing price of my hometown, I get blocked by 403 error even though having tried Headers User Agent. Here is my code :
**url3 = "https://batdongsan.com.vn/nha-dat-ban-tp-hcm"
headers = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49"}
page = requests.get(url3, headers = headers)
print(page)**
Result : <Response [403]>
Have anyone tried and succeeded to bypass the same problem. Any help is highly appriciated.
Many thanks
CodePudding user response:
import cloudscraper
scraper = cloudscraper.create_scraper()
soup = BeautifulSoup(scraper.get("https://batdongsan.com.vn/nha-dat-ban-tp-hcm").text)
print(soup.text) ## do what you want with the response
You can install cloudscraper with pip install cloudscraper