I would like to ask for some help with this .json file that I have. I have looked through the pd.json_normalize() method extensively but I was not able to get the formatting right.
The line of code I was staring to experiment with was this ''' result_df = pd.json_normalize(cgcryptohistory_data) '''
I would love to get my json formatted into a df that is formatted as such:
date | bitcoin prices | bitcoin market_caps | bitcoin total_volumes | ethereum prices | ethereum market_caps | ethereum total_volumes | |
---|---|---|---|---|---|---|---|
1637920962758 | 55084.24409740329 | 1040185692035.8112 | 4096.986983019884 | ... | ... | ||
1637924583096 | ... | ... | ... | ... | ... | ... |
I have been looking at this documentation but was not able to make it work with the unnamed nested values. https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html https://www.kaggle.com/jboysen/quick-tutorial-flatten-nested-json-in-pandas/notebook
[
[
{
"crypto": "bitcoin"
}
],
{
"prices": [
[
1637920962758,
55084.24409740329
],
[
1637924583096,
54657.9826454445
],
[
1637928143387,
54031.99796233907
],
[
1638524408000,
56556.355173823926
]
],
"market_caps": [
[
1637920962758,
1040185692035.8112
],
[
1637924583096,
1032137732028.0712
],
[
1637928143387,
1020318960913.6139
],
[
1638524408000,
1068341065780.2579
]
],
"total_volumes": [
[
1637920962758,
40002799175.46155
],
[
1637924583096,
38579701553.8867
],
[
1637928143387,
39373185822.85809
],
[
1638524408000,
32567680716.236423
]
]
},
[
{
"crypto": "ethereum"
}
],
{
"prices": [
[
1637920951704,
4096.986983019884
],
[
1637924408082,
4072.6963895955864
],
[
1637928090810,
4021.2930336538925
],
[
1638524390000,
4559.839444343959
]
],
"market_caps": [
[
1637920951704,
485474079335.9266
],
[
1637924408082,
482758573953.61304
],
[
1637928090810,
479260985689.3548
],
[
1638524390000,
540740261905.95264
]
],
"total_volumes": [
[
1637920951704,
25972933719.35031
],
[
1637924408082,
26468521371.13646
],
[
1637928090810,
27042124946.11916
],
[
1638524390000,
20268892519.524815
]
]
}
]
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
CodePudding user response:
Assume js is your json, here's how i would have done it.
l = []
for i in range(0,len(js),2):
prices = [k[1] for k in js[i 1]["prices"]]
market_caps = [k[1] for k in js[i 1]["market_caps"]]
total_volumes = [k[1] for k in js[i 1]["total_volumes"]]
date = [k[0] for k in js[i 1]["total_volumes"]]
crypto = js[i][0]["crypto"]
df = pd.DataFrame({"crypto":crypto,"prices":prices,"market_caps":market_caps,"total_volumes":total_volumes,"date":date})
l.append(df)
df = pd.concat(l)
Ouput:
crypto prices market_caps total_volumes date
0 bitcoin 55084.244097 1.040186e 12 4.000280e 10 1637920962758
1 bitcoin 54657.982645 1.032138e 12 3.857970e 10 1637924583096
2 bitcoin 54031.997962 1.020319e 12 3.937319e 10 1637928143387
3 bitcoin 56556.355174 1.068341e 12 3.256768e 10 1638524408000
0 ethereum 4096.986983 4.854741e 11 2.597293e 10 1637920951704
1 ethereum 4072.696390 4.827586e 11 2.646852e 10 1637924408082
2 ethereum 4021.293034 4.792610e 11 2.704212e 10 1637928090810
3 ethereum 4559.839444 5.407403e 11 2.026889e 10 1638524390000
This way it's more scalable, and you can just filter en the crypto you want like this :
df[df.crypto == "bitcoin"]
output
crypto prices market_caps total_volumes date
0 bitcoin 55084.244097 1.040186e 12 4.000280e 10 1637920962758
1 bitcoin 54657.982645 1.032138e 12 3.857970e 10 1637924583096
2 bitcoin 54031.997962 1.020319e 12 3.937319e 10 1637928143387
3 bitcoin 56556.355174 1.068341e 12 3.256768e 10 1638524408000