Hope someone Have the answer.
I try to scrape a specific website.
The problem is that the requests.get(url) scrapes everything very quickly.
Therefore i'm blocked.
Is their a way to slow down requests.get(url) ?
Thank you for your help.
from bs4 import BeautifulSoup
import requests
url = 'website.fr'
response = requests.get(url)
print(response)
Print result: As you were using this website, something about your browser or behavior made us think you might be a bot. Solve the captcha below to continue browsing the site.
CodePudding user response:
You should configure a timeout and see if the scraper works for you.
r = requests.get('https://github.com', timeout=5)
https://docs.python-requests.org/en/latest/user/advanced/#timeouts
CodePudding user response:
Note While there is no information about url it is hard to reproduce
A first approache could be to add some headers
to your request - This will not slow down but adresses that there "will be browser". Alternative approache is to use selenium.
Example
from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-61acac03-6279b8a6274777eb44d81aae",
"X-Client-Data": "CJW2yQEIpLbJAQjEtskBCKmdygEIuevKAQjr8ssBCOaEzAEItoXMAQjLicwBCKyOzAEI3I7MARiOnssB" }
url = 'https://www.france.fr/fr'
response = requests.get(url, headers = headers)
print(response)
Output
<Response [200]>
If you are iterating urls you also can add same delay with the time
modul:
from bs4 import BeautifulSoup
import requests, time
...
for url in urls:
time.sleep(3)
response = requests.get(url, headers = headers)
...