Home > database >  Parsing JSON with number as key usng pandas
Parsing JSON with number as key usng pandas

Time:11-13

My question is similar to this one - Parsing JSON with number as key and this one Parsing nested JSON except that I want to parse my JSON file with pandas normalize. Eventually I want to write this dataframe into an excel file( and I have the code to do that) . The dataframe I want in memory should look like this

Timestamp  BatteryVoltage GridCurrent GridVoltage InverterCurrent InverterVoltage 

....

....

The current code that I have does not help me at all -

import json
import datetime
import pandas as pd
from pandas.io.json import json_normalize

with open('test.json') as data_file:
     data = json.load(data_file)


  

df = pd.json_normalize(data['timestamp'])

I know I have to give something as argument to json_normalize record_path but I am not sure what it is going to be as the value of timestamp keeps changing.

{"timestamp": {
"1636987025": {
  "batteryVoltage": 28.74732,
  "gridCurrent": 3.68084,
  "gridVoltage": 230.64401,
  "inverterCurrent": 2.00471,
  "inverterVoltage": 224.18573,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987008
},
"1636987085": {
  "batteryVoltage": 28.52959,
  "gridCurrent": 3.40046,
  "gridVoltage": 230.41367,
  "inverterCurrent": 1.76206,
  "inverterVoltage": 225.24319,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987136
},
"1636987146": {
  "batteryVoltage": 28.5338,
  "gridCurrent": 3.37573,
  "gridVoltage": 229.27209,
  "inverterCurrent": 2.11128,
  "inverterVoltage": 225.51733,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987136
},
"1636987206": {
  "batteryVoltage": 28.55535,
  "gridCurrent": 3.43365,
  "gridVoltage": 229.47604,
  "inverterCurrent": 1.98594,
  "inverterVoltage": 225.83649,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987264
  }
 }
}

CodePudding user response:

pandas' json_normalize is designed for json/dicts that lists embedded in them; your data does not have that.

One option is with jmespath:

# pip install jmespath

expression = jmespath.compile("timestamp.*")
pd.DataFrame(expression.search(data)) 
   batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue
0        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008
1        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136
2        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136
3        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264

have a look at the docs for jmespath; it is a wonderful tool that can be handy in wrangling json.

Another option, without jmespath and just pure python:

result = [value for _, value in data['timestamp'].items()]
pd.DataFrame(result)
   batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue
0        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008
1        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136
2        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136
3        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264

To capture the timestamp is easy:

# for python 3.9, you can use | instead for the dictionary merging
result = [{**value, **{'timestamp':key}} for key, value in data['timestamp'].items()]

pd.DataFrame(result)
   batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue   timestamp
0        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008  1636987025
1        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136  1636987085
2        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136  1636987146
3        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264  1636987206

Another option:

pd.DataFrame.from_dict(data['timestamp'], orient='index').rename_axis('timestamp').reset_index()
    timestamp  batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue
0  1636987025        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008
1  1636987085        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136
2  1636987146        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136
3  1636987206        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264
  • Related