I use BS4 to get Browse Standards by Technology from website: https://standards.globalspec.com/
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://standards.globalspec.com/"
q1 = urlopen(url)
soup = BeautifulSoup(q1, 'lxml')
print(soup)
But i have an error: urllib.error.HTTPError: HTTP Error 503: Service Temporarily Unavailable
Could anyone see what could be causing this error?
CodePudding user response:
@Samt94 already has stated that the website is under cloudflare protection. So you can use cloudscraper
instead of requests
from bs4 import BeautifulSoup
import cloudscraper
scraper = cloudscraper.create_scraper(delay=10, browser={'custom': 'ScraperBot/1.0',})
url = 'https://standards.globalspec.com/'
req = scraper.get(url)
print(req)
soup = BeautifulSoup(req.text,'lxml')
Output:
<Response [200]>
CodePudding user response:
You can use CloudScraper to access websites that use CloudFlare DDoS Protection:
from bs4 import BeautifulSoup
import cloudscraper
url = "https://standards.globalspec.com/"
scraper = cloudscraper.create_scraper()
q1 = scraper.get(url)
soup = BeautifulSoup(q1.text, 'lxml')
print(soup)