Home > Back-end >  SEC EDGAR 13F source HTTPError: HTTP Error 403: Forbidden
SEC EDGAR 13F source HTTPError: HTTP Error 403: Forbidden

Time:12-17

Please help, SEC EDGAR used to work flawlessly until now. it gives HTTPError: HTTP Error 403: Forbidden

import pandas as pd
tables = pd.read_html("https://www.sec.gov/Archives/edgar/data/1541617/000110465920125814/xslForm13F_X01/infotable.xml")
df=tables[3] 
df

CodePudding user response:

It looks like the site is rejecting your request since it detects the request is automated. You can bypass this if you add the header User-Agent: Mozilla/5.0 to the http request since that will make it look like the request is coming from a firefox browser. Unfortunately though, pd.read_html does not support changing the request headers, so we have to make the request on our own using the requests library.

Install requests with pip install requests

Then change your code to look like this:

import pandas as pd
import requests

# Makes a request to the url
request = reqeusts.get("https://www.sec.gov/Archives/edgar/data/1541617/000110465920125814/xslForm13F_X01/infotable.xml", headers={"User-Agent": "Mozilla/5.0"})

# Pass the html response into read_html
tables = pd.read_html(request.text)

df = tables[3] 
print(df)

One thing I have noticed about the site is it will not allow requests from non-residential ip addresses and will always give you a 403. So if you are executing this code somewhere in the cloud (such as repl.it, through a vpn, or similar) this code will not work at all. Running it on my home computer this code works perfectly though. The site also says that it will block your ip address if you make more then 10 requests per second or a excess amount of requests overall, so do be sure to tread lightly on how many times you make requests to the website.

  • Related