Home > Net >  Python - Issue while trying to read JSON file from API Query
Python - Issue while trying to read JSON file from API Query

Time:10-06

I am querying the NIH Reporter API (https://api.reporter.nih.gov/), and I was able to get an output which is formatted as a JSON. My code for the query looks like:

params={
 "criteria":
{
 "publicationsSearch": {
      "publicationsTextSearch": "string",
      "pmId": [
        "21444817"
      ]
     } 
 }
,
 "include_fields": [
        "ApplId","SubprojectId","ProjectNum","FiscalYear", "AwardAmount"
     ],
 "offset":0,
 "limit":500,
 "sort_order":"desc"
 }
 

response = requests.post("https://api.reporter.nih.gov/v2/projects/search", json=params)

print(response.status_code)

print(response.text)

and response.text returns something that looks like:

{"meta":{"search_id":"-nePJuJYiUaI8MaRiLmp3Q","total":2663912,"offset":0,"limit":500,"sort_field":null,"sort_order":"desc","sorted_by_relevance":false,"properties":{"URL":"https:/reporter.nih.gov/search/-nePJuJYiUaI8MaRiLmp3Q/projects"}},"results":[{"appl_id":3949054,"subproject_id":"0230","fiscal_year":1987,"project_num":"5M01RR000240-23","award_amount":null},{"appl_id":7599150,"subproject_id":"9016","fiscal_year":2008,"project_num":"5P30CA043703-19","award_amount":205077}]}

My ultimate goal is to save this as a dataframe.

To that end, I am trying to save the output from request.text as a JSON file and then read it with pandas (is there a better way to do that?). My code looks like:

with open("sample.json", "w") as outfile:
    json.dump(response.text, outfile)

df =pd.DataFrame("sample.json")

My sense is that sample.json was indeed saved (I can open it and it looks like a JSON file), but the last line throws me the following error: "DataFrame constructor not properly called!", and I honestly have no idea how I can solve this issue...

CodePudding user response:

Instead of saving to disk and reading from disk, I would create a DataFrame object using pandas.DataFrame.from_dict. Does the following work?

df = pd.DataFrame.from_dict(response.json()['results'])

It calls response.json(), which should parse and convert the response's text into a Python dict object. You may also want to set up some error handling in case data retrieval fails or returns an unexpected result (e.g., text not in JSON format may cause response.json() to fail).

  • Related