My question is similar to this one - Parsing JSON with number as key and this one Parsing nested JSON except that I want to parse my JSON file with pandas normalize. Eventually I want to write this dataframe into an excel file( and I have the code to do that) . The dataframe I want in memory should look like this
Timestamp BatteryVoltage GridCurrent GridVoltage InverterCurrent InverterVoltage
....
....
The current code that I have does not help me at all -
import json
import datetime
import pandas as pd
from pandas.io.json import json_normalize
with open('test.json') as data_file:
data = json.load(data_file)
df = pd.json_normalize(data['timestamp'])
I know I have to give something as argument to json_normalize record_path but I am not sure what it is going to be as the value of timestamp keeps changing.
{"timestamp": {
"1636987025": {
"batteryVoltage": 28.74732,
"gridCurrent": 3.68084,
"gridVoltage": 230.64401,
"inverterCurrent": 2.00471,
"inverterVoltage": 224.18573,
"solarCurrent": 0,
"solarVoltage": 0,
"tValue": 1636987008
},
"1636987085": {
"batteryVoltage": 28.52959,
"gridCurrent": 3.40046,
"gridVoltage": 230.41367,
"inverterCurrent": 1.76206,
"inverterVoltage": 225.24319,
"solarCurrent": 0,
"solarVoltage": 0,
"tValue": 1636987136
},
"1636987146": {
"batteryVoltage": 28.5338,
"gridCurrent": 3.37573,
"gridVoltage": 229.27209,
"inverterCurrent": 2.11128,
"inverterVoltage": 225.51733,
"solarCurrent": 0,
"solarVoltage": 0,
"tValue": 1636987136
},
"1636987206": {
"batteryVoltage": 28.55535,
"gridCurrent": 3.43365,
"gridVoltage": 229.47604,
"inverterCurrent": 1.98594,
"inverterVoltage": 225.83649,
"solarCurrent": 0,
"solarVoltage": 0,
"tValue": 1636987264
}
}
}
CodePudding user response:
pandas' json_normalize is designed for json/dicts that lists embedded in them; your data does not have that.
One option is with jmespath
:
# pip install jmespath
expression = jmespath.compile("timestamp.*")
pd.DataFrame(expression.search(data))
batteryVoltage gridCurrent gridVoltage inverterCurrent inverterVoltage solarCurrent solarVoltage tValue
0 28.74732 3.68084 230.64401 2.00471 224.18573 0 0 1636987008
1 28.52959 3.40046 230.41367 1.76206 225.24319 0 0 1636987136
2 28.53380 3.37573 229.27209 2.11128 225.51733 0 0 1636987136
3 28.55535 3.43365 229.47604 1.98594 225.83649 0 0 1636987264
have a look at the docs for jmespath; it is a wonderful tool that can be handy in wrangling json.
Another option, without jmespath and just pure python:
result = [value for _, value in data['timestamp'].items()]
pd.DataFrame(result)
batteryVoltage gridCurrent gridVoltage inverterCurrent inverterVoltage solarCurrent solarVoltage tValue
0 28.74732 3.68084 230.64401 2.00471 224.18573 0 0 1636987008
1 28.52959 3.40046 230.41367 1.76206 225.24319 0 0 1636987136
2 28.53380 3.37573 229.27209 2.11128 225.51733 0 0 1636987136
3 28.55535 3.43365 229.47604 1.98594 225.83649 0 0 1636987264
To capture the timestamp is easy:
# for python 3.9, you can use | instead for the dictionary merging
result = [{**value, **{'timestamp':key}} for key, value in data['timestamp'].items()]
pd.DataFrame(result)
batteryVoltage gridCurrent gridVoltage inverterCurrent inverterVoltage solarCurrent solarVoltage tValue timestamp
0 28.74732 3.68084 230.64401 2.00471 224.18573 0 0 1636987008 1636987025
1 28.52959 3.40046 230.41367 1.76206 225.24319 0 0 1636987136 1636987085
2 28.53380 3.37573 229.27209 2.11128 225.51733 0 0 1636987136 1636987146
3 28.55535 3.43365 229.47604 1.98594 225.83649 0 0 1636987264 1636987206
Another option:
pd.DataFrame.from_dict(data['timestamp'], orient='index').rename_axis('timestamp').reset_index()
timestamp batteryVoltage gridCurrent gridVoltage inverterCurrent inverterVoltage solarCurrent solarVoltage tValue
0 1636987025 28.74732 3.68084 230.64401 2.00471 224.18573 0 0 1636987008
1 1636987085 28.52959 3.40046 230.41367 1.76206 225.24319 0 0 1636987136
2 1636987146 28.53380 3.37573 229.27209 2.11128 225.51733 0 0 1636987136
3 1636987206 28.55535 3.43365 229.47604 1.98594 225.83649 0 0 1636987264