I have watched other questions on stakeoverflow regarding HTTP 403 error however, have not found solution there.
i would like to change error from 403 to 200
trying to scrape this url https://angel.co/startups
.
import requests
import random
my_session = requests.session()
for_cookies = my_session.get('https://angel.co/startups')
cookies = for_cookies.cookies
user_agents_list = [
'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko)
Mobile/15E148',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/99.0.4844.83 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/99.0.4844.51 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/105.0.0.0 Safari/537.36',
]
response = my_session.get('https://angel.co/startups',cookies=cookies, headers={'User-Agent':
random.choice(user_agents_list)})
print(response.text)
response.status_code #403
while running this code i am getting 403 error and instead of whole HTML
page.
CodePudding user response:
It may be due to cloudflare protection or some sort of protection.
So, use cloudscraper to bypass it.
import cloudscraper
url = "https://angel.co/startups"
scraper = cloudscraper.create_scraper()
response = scraper.get(url)
text = response.text
print(response.status_code)
Output
200