The problem here is different than those defined in the questions here, here, and here. Specifically, the transformation and form of the output I want is different than any specified in those questions, and I also want a DateTime index. These differences cause the answers on those pages fail.
I have data formatted as a json like this:
{
"Meta Data": {
"1. Information": "Daily Prices (open, high, low, close) and Volumes",
"2. Symbol": "ABC",
},
"Time Series (Daily)": {
"2001-06-31": {
"1. open": "113.2000",
"4. close": "113.8000",
},
"2001-07-01": {
"1. open": "114.2000",
"4. close": "114.2000",
}
}
}
I want the output to look be a pandas dataframe like this:
"Time Series (Daily)" | "1. open" | "2. close"
"2001-06-31" | 113.2000 | 113.8000
"2001-07-01" | 114.2000 | 114.2000
I wrote a function that works, but the for
loop leaves performance wanting and I find it hard to read.
def convert_json_to_dataframe(all_json_data):
json_data = all_json_data["Time Series (Daily)"]
dates = []
open = []
close = []
for key in json_data.keys():
dates.append(key)
open.append(json_data[key]["1. open"])
close.append(json_data[key]["4. close"])
df = pd.DataFrame(
list(zip(open, close)),
columns=["1. open", "2. close"],
index=dates,
)
df = df.apply(pd.to_numeric, errors="ignore")
return df
There's got to be a simpler, easier to read, higher-performing way to do this, maybe with json_normalize
in pandas, but I can't figure it out.
Update after answers: All I had to do was:
df = pd.DataFrame(json_data["Time Series (Daily)"]).T
Transpose is my new favorite pandas method.
Pandas discovered the index and column names automatically, so I didn't need the reset_index
approach. I got an unexpected keyword argument
error when I tried the orient
approach, though I expect that's because this environment is using pandas 0.22.
Thank you for the answers everyone.
CodePudding user response:
Why don't you just do this?
pd.DataFrame(data["Time Series (Daily)"]).T.reset_index().rename(columns = {"index":"Time Series (Daily)"})
Output -
Time Series (Daily) | 1. open | 4. close | |
---|---|---|---|
0 | 2001-06-31 | 113.2000 | 113.8000 |
1 | 2001-07-01 | 114.2000 | 114.2000 |
CodePudding user response:
It seems the relevant data is only under "Time Series (Daily)"
key, so you could get that and construct a DataFrame (use the orient
parameter to get it in the correct shape):
out = pd.DataFrame.from_dict(my_data['Time Series (Daily)'], orient='index')
Output:
1. open 4. close
2001-06-31 113.2000 113.8000
2001-07-01 114.2000 114.2000
CodePudding user response:
Taking json from an URL is the easiest way:
import requests
url='url from json file'
r=requests.get(url)
data=r.json()
df=pd.DataFrame(data['Time Series (Daily)']).T
df=df.reset_index('Time Series (Daily)')
Saying “thanks” is appreciated, but it doesn’t answer the question. Instead, vote up the answers that helped you the most! If these answers were helpful to you, please consider saying thank you in a more constructive way – by contributing your own answers to questions your peers have asked here.