I'm writing code that performs web scraping. I'm trying to get HTML code from the Cambridge dictionary website, but an error message pops up. I would really appreciate it if you can teach me the reason for the error and the solution to this problem.
Here is my code:
import requests
from bs4 import BeautifulSoup
def checkWord(word):
url_top = "https://dictionary.cambridge.org/dictionary/english/"
url = url_top word
headers = requests.utils.default_headers()
headers.update(
{
'User-Agent': 'My User Agent 1.0',
}
)
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(html, 'html.parser')
check = soup.find("title")
boolean = check.string
if boolean == "Cambridge English Dictionary: Meanings & Definitions":
return False
else:
return True
word = "App"
checkWord(word)
However, error occured at html = requests.get(url, headers=headers).text
Error message is shown below--
Exception has occurred: ConnectionError
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
File "<string>", line 3, in raise_from
During handling of the above exception, another exception occurred:
File "<string>", line 3, in raise_from
During handling of the above exception, another exception occurred:
CodePudding user response:
Your code is worrking fine all the times. Most likely the problem is your local internet that's why it may be temporarily or check your internet connection
import requests
from bs4 import BeautifulSoup
def checkWord(word):
url_top = "https://dictionary.cambridge.org/dictionary/english/"
url = url_top word
headers = requests.utils.default_headers()
headers.update(
{
'User-Agent': 'Mozilla/5.0',
}
)
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(html, 'html.parser')
check = soup.find("title").text
print(check)
word = "App"
checkWord(word)
Output:
APP | meaning, definition in Cambridge English Dictionary
CodePudding user response:
It looks like the remote host banned you. If you still can open the website from your computer with a web browser try to change user agent to something like this:
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"