Home > other >  Convert Nested DateTime json to Pandas DataFrame
Convert Nested DateTime json to Pandas DataFrame

Time:05-11

The problem here is different than those defined in the questions here, here, and here. Specifically, the transformation and form of the output I want is different than any specified in those questions, and I also want a DateTime index. These differences cause the answers on those pages fail.

I have data formatted as a json like this:

{
    "Meta Data": {
        "1. Information": "Daily Prices (open, high, low, close) and Volumes",
        "2. Symbol": "ABC",
    },
    "Time Series (Daily)": {
        "2001-06-31": {
            "1. open": "113.2000",
            "4. close": "113.8000",
        },
        "2001-07-01": {
            "1. open": "114.2000",
            "4. close": "114.2000",
        }
    }
}

I want the output to look be a pandas dataframe like this:

"Time Series (Daily)" | "1. open" | "2. close"
"2001-06-31"          | 113.2000  | 113.8000
"2001-07-01"          | 114.2000  | 114.2000

I wrote a function that works, but the for loop leaves performance wanting and I find it hard to read.

def convert_json_to_dataframe(all_json_data):
    json_data = all_json_data["Time Series (Daily)"]
    dates = []
    open = []
    close = []
    for key in json_data.keys():
        dates.append(key)
        open.append(json_data[key]["1. open"])
        close.append(json_data[key]["4. close"])
    df = pd.DataFrame(
        list(zip(open, close)),
        columns=["1. open", "2. close"],
        index=dates,
    )
    df = df.apply(pd.to_numeric, errors="ignore")
    return df

There's got to be a simpler, easier to read, higher-performing way to do this, maybe with json_normalize in pandas, but I can't figure it out.


Update after answers: All I had to do was:

df = pd.DataFrame(json_data["Time Series (Daily)"]).T

Transpose is my new favorite pandas method.

Pandas discovered the index and column names automatically, so I didn't need the reset_index approach. I got an unexpected keyword argument error when I tried the orient approach, though I expect that's because this environment is using pandas 0.22.

Thank you for the answers everyone.

CodePudding user response:

Why don't you just do this?

pd.DataFrame(data["Time Series (Daily)"]).T.reset_index().rename(columns = {"index":"Time Series (Daily)"})

Output -

Time Series (Daily) 1. open 4. close
0 2001-06-31 113.2000 113.8000
1 2001-07-01 114.2000 114.2000

CodePudding user response:

It seems the relevant data is only under "Time Series (Daily)" key, so you could get that and construct a DataFrame (use the orient parameter to get it in the correct shape):

out = pd.DataFrame.from_dict(my_data['Time Series (Daily)'], orient='index')

Output:

            1. open  4. close
2001-06-31  113.2000  113.8000
2001-07-01  114.2000  114.2000

CodePudding user response:

Taking json from an URL is the easiest way:

import requests
url='url from json file'
r=requests.get(url)
data=r.json()
df=pd.DataFrame(data['Time Series (Daily)']).T
df=df.reset_index('Time Series (Daily)')

Saying “thanks” is appreciated, but it doesn’t answer the question. Instead, vote up the answers that helped you the most! If these answers were helpful to you, please consider saying thank you in a more constructive way – by contributing your own answers to questions your peers have asked here.

  • Related