Home > Back-end >  HTTP Error 404: Not Found with an existing url
HTTP Error 404: Not Found with an existing url

Time:11-06

I'm trying to use read some data from the web but I'm having an unexpected problem. I call it unexpected because if I print the web I'm trying to reading, it exists and it gives no problems. However, when I use the following code (see below) I receive the so-called error "HTTP Error 404: Not Found with an existing url". But the url exists (see here)... Does anyone know what am I doing wrong? Thanks!

    import pandas as pd
    from bs4 import BeautifulSoup
    import urllib.request as ur
        
    index = 'MSFT'
    url_is = 'https://finance.yahoo.com/quote/'   index   '/financials?p='   index
    # Readdata
    read_data = ur.urlopen(url_is).read()

CodePudding user response:

Some sites require a valid "User-Agent" identifier header. In your example with urllib, as the URL parameter of urlopen can also be a Request object, you could specify the headers in the Request object along with the url, as below:

from urllib.request import Request, urlopen

index = 'MSFT'
url_is = 'https://finance.yahoo.com/quote/'   index   '/financials?p='   index
req = Request(url_is, headers={'User-Agent': 'Mozilla/5.0'})
html = urlopen(req).read()

CodePudding user response:

Using requests module and injecting User-Agent, response status is 200 as follows:

from bs4 import BeautifulSoup
import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36'}
index = 'MSFT'
url_is = 'https://finance.yahoo.com/quote/'   index   '/financials?p='   index
r = requests.get(url_is, headers=headers)
print(r.status_code)
#page = BeautifulSoup(r.content, 'lxml')

Output:

200
  • Related