Home > other >  How to download dictionary from url?
How to download dictionary from url?

Time:10-13

I would like to download the dictionary from the following url: https://data.sec.gov/api/xbrl/companyfacts/CIK0000320193.json

The reason for that is because I would like to extract some datas from the dict into a pandas df. It should look like this:

filed_date   filed_periode   form    accn
2020-11-01   Q4              10-K    0001193125-15-153166
2020-08-01   Q3              10-Q    0001193125-15-153112

I could extract a dict from another SEC-link by using following code:

import pandas as pd
import urllib
import json

url1 = 'https://www.sec.gov/files/company_tickers_exchange.json'

sec_dict = urllib.request.urlopen(url1)
for line in sec_dict:
    decoded_line = line.decode("utf-8")
company_dict = json.loads(decoded_line)

If used the above code for the 1st url, I'll get the following error:

HTTPError: HTTP Error 403: Forbidden

I have tried another following approach but get the same error:

import urllib.request

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = "https://data.sec.gov/api/xbrl/companyfacts/CIK0000320193.json"
headers={'User-Agent':user_agent,} 

request=urllib.request.Request(url,None,headers)
response = urllib.request.urlopen(request)
data = response.read()

Thank you in advance for any pointer :-)

CodePudding user response:

The SEC has an unusual requirement for user agent strings. They want it to be in the format Sample Company Name AdminContact@<sample company domain>.com

So for me, a compliant user agent would be:

user_agent = 'Dan Monego <myemail>@<emailservice>'

Change the user agent to include your name and email.

  • Related