I'm trying to use urllib.openurl()
like that:
import urllib
html = urllib.request.urlopen(url="https://www.otcmarkets.com")
But it just doesn't work for some reason. If I give different HTTPS url like 'https://www.google.com' it works but for some websites like this one I just can't open the url. Can I do something to make it work? Do you have another way to extract html from website?
CodePudding user response:
The website checks if you set a User-Agent in the headers. The requests-package makes it easy to use urllib.
You can do what you want by using something like that:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
resp = requests.get("https://www.otcmarkets.com", headers=headers)
resp.text
CodePudding user response:
Try those methods;
Method 01:
import urllib.request
html_page= urllib.request.urlopen("https://www.otcmarkets.com")
Method 02
from urllib.request import urlopen
html_page= urlopen("https://www.otcmarkets.com")
Method 03 (Using requests):
import requests
html_page= requests.get("https://www.otcmarkets.com")