BeautifulSoup - Slow down requests.get-CodePudding

Hope someone Have the answer.

I try to scrape a specific website.

The problem is that the requests.get(url) scrapes everything very quickly.

Therefore i'm blocked.

Is their a way to slow down requests.get(url) ?

Thank you for your help.

from bs4 import BeautifulSoup
import requests
url = 'website.fr'
response = requests.get(url)
print(response)

Print result: As you were using this website, something about your browser or behavior made us think you might be a bot. Solve the captcha below to continue browsing the site.

CodePudding user response：

You should configure a timeout and see if the scraper works for you.

r = requests.get('https://github.com', timeout=5)

https://docs.python-requests.org/en/latest/user/advanced/#timeouts

CodePudding user response：

Note While there is no information about url it is hard to reproduce

A first approache could be to add some headers to your request - This will not slow down but adresses that there "will be browser". Alternative approache is to use selenium.

Example

from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-61acac03-6279b8a6274777eb44d81aae", 
    "X-Client-Data": "CJW2yQEIpLbJAQjEtskBCKmdygEIuevKAQjr8ssBCOaEzAEItoXMAQjLicwBCKyOzAEI3I7MARiOnssB" }
url = 'https://www.france.fr/fr'
response = requests.get(url, headers = headers)
print(response)

Output

<Response [200]>

If you are iterating urls you also can add same delay with the time modul:

from bs4 import BeautifulSoup
import requests, time

...

for url in urls:
    time.sleep(3)
    response = requests.get(url, headers = headers)
...