Home > Back-end >  Nested JSON unpack into PD dataframe
Nested JSON unpack into PD dataframe

Time:12-04

I would like to ask for some help with this .json file that I have. I have looked through the pd.json_normalize() method extensively but I was not able to get the formatting right.

The line of code I was staring to experiment with was this ''' result_df = pd.json_normalize(cgcryptohistory_data) '''

I would love to get my json formatted into a df that is formatted as such:

date bitcoin prices bitcoin market_caps bitcoin total_volumes ethereum prices ethereum market_caps ethereum total_volumes
1637920962758 55084.24409740329 1040185692035.8112 4096.986983019884 ... ...
1637924583096 ... ... ... ... ... ...

I have been looking at this documentation but was not able to make it work with the unnamed nested values. https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html https://www.kaggle.com/jboysen/quick-tutorial-flatten-nested-json-in-pandas/notebook

[
  [
    {
      "crypto": "bitcoin"
    }
  ],
  {
    "prices": [
      [
        1637920962758,
        55084.24409740329
      ],
      [
        1637924583096,
        54657.9826454445
      ],
      [
        1637928143387,
        54031.99796233907
      ],
      [
        1638524408000,
        56556.355173823926
      ]
    ],
    "market_caps": [
      [
        1637920962758,
        1040185692035.8112
      ],
      [
        1637924583096,
        1032137732028.0712
      ],
      [
        1637928143387,
        1020318960913.6139
      ],
      [
        1638524408000,
        1068341065780.2579
      ]
    ],
    "total_volumes": [
      [
        1637920962758,
        40002799175.46155
      ],
      [
        1637924583096,
        38579701553.8867
      ],
      [
        1637928143387,
        39373185822.85809
      ],
      [
        1638524408000,
        32567680716.236423
      ]
    ]
  },
  [
    {
      "crypto": "ethereum"
    }
  ],
  {
    "prices": [
      [
        1637920951704,
        4096.986983019884
      ],
      [
        1637924408082,
        4072.6963895955864
      ],
      [
        1637928090810,
        4021.2930336538925
      ],
      [
        1638524390000,
        4559.839444343959
      ]
    ],
    "market_caps": [
      [
        1637920951704,
        485474079335.9266
      ],
      [
        1637924408082,
        482758573953.61304
      ],
      [
        1637928090810,
        479260985689.3548
      ],
      [
        1638524390000,
        540740261905.95264
      ]
    ],
    "total_volumes": [
      [
        1637920951704,
        25972933719.35031
      ],
      [
        1637924408082,
        26468521371.13646
      ],
      [
        1637928090810,
        27042124946.11916
      ],
      [
        1638524390000,
        20268892519.524815
      ]
    ]
  }
]
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

Assume js is your json, here's how i would have done it.

l = []
for i in range(0,len(js),2):
    prices = [k[1] for k in js[i 1]["prices"]]
    market_caps = [k[1] for k in js[i 1]["market_caps"]]
    total_volumes = [k[1] for k in js[i 1]["total_volumes"]]
    date =  [k[0] for k in js[i 1]["total_volumes"]]
    crypto =  js[i][0]["crypto"]
    df = pd.DataFrame({"crypto":crypto,"prices":prices,"market_caps":market_caps,"total_volumes":total_volumes,"date":date})
    l.append(df)
df = pd.concat(l)

Ouput:

     crypto        prices   market_caps  total_volumes           date
0   bitcoin  55084.244097  1.040186e 12   4.000280e 10  1637920962758
1   bitcoin  54657.982645  1.032138e 12   3.857970e 10  1637924583096
2   bitcoin  54031.997962  1.020319e 12   3.937319e 10  1637928143387
3   bitcoin  56556.355174  1.068341e 12   3.256768e 10  1638524408000
0  ethereum   4096.986983  4.854741e 11   2.597293e 10  1637920951704
1  ethereum   4072.696390  4.827586e 11   2.646852e 10  1637924408082
2  ethereum   4021.293034  4.792610e 11   2.704212e 10  1637928090810
3  ethereum   4559.839444  5.407403e 11   2.026889e 10  1638524390000

This way it's more scalable, and you can just filter en the crypto you want like this :

df[df.crypto == "bitcoin"]

output

    crypto        prices   market_caps  total_volumes           date
0  bitcoin  55084.244097  1.040186e 12   4.000280e 10  1637920962758
1  bitcoin  54657.982645  1.032138e 12   3.857970e 10  1637924583096
2  bitcoin  54031.997962  1.020319e 12   3.937319e 10  1637928143387
3  bitcoin  56556.355174  1.068341e 12   3.256768e 10  1638524408000
  • Related