Home > other >  Remove outer array from JSON in Python dataframe
Remove outer array from JSON in Python dataframe

Time:11-02

I am using the following code to create an Excel file from an API JSON:

import requests

import pandas as pd

df = pd.read_json("my API url")


df.to_excel("myFileLocation.xlsx")

The problem is that my JSON doesn't parse correctly, due to an outer array in the API (I've tested by manually altering the JSON to remove the "outer array" and it parses correctly):

{"outer array": [{"Header1": "Value1", "Header2": Value2},{"Header1": "Value3", "Header 2": Value 4}]}

How can I update my existing code to remove this outer array?

CodePudding user response:

Assuming the json returned by your URL is as follows:

d = '{"outer array": [{"Header1": "Value1", "Header2": "Value2"}, {"Header1": "Value3", "Header2": "Value4"}]}'

You could simply parse this json before calling pd.DataFrame:

import requests
import json
import pandas as pd

d = requests.get("http://my/api/url/").json()
df = pd.DataFrame(json.loads(d)["outer array"]) #if d is a string
#df = pd.DataFrame(d["outer array"]) #uncomment if d is not a string

>>> df
  Header1 Header2
0  Value1  Value2
1  Value3  Value4

CodePudding user response:

You'll need to do some manipulation before passing the data to Pandas, which unfortunately means you'll need to retrieve the data from the URL yourself.

You can do this as a string manipulation, or parse it into a dictionary and access the value. Using string manipulations:

import requests

response = requests.get("my API url")
df = pd.read_json(response.text.removeprefix('{"outer array": ').removesuffix('}'))

You may need to install Requests:

python -m pip install requests
  • Related