Home > Software engineering >  Converting nested JSON object to pandas Dataframe
Converting nested JSON object to pandas Dataframe

Time:11-09

I am dealing with a JSON file which has nested fields (arrays). I am trying to convert the same into a Pandas dataframe.

{
    "_id": "2026",
    "dataDate": 1537920000000,
    "dataYear": 2018,
    "groupId": "1378",
    "HourConsumed": 19781.4,
    "HourGenerated": 0,
    "max": 4658.400000000001,
    "maxGen": 0,
    "maxTime": 1538001000000,
    "avg": -206.05625,
    "max": 0,
    "maxGen": 0,
    "maxTime": null,
    "avgTemp": 0,
    "me_Id": "2004506_3166155129",
    "interval": 15,
    "intervalMetaData": [
        "whC",
        "whG",
        "max",
        "maxGen",
        "hC",
        "hG",
        "maxVar",
        "maxGen",
        "avgTemp",
        "eventTime"
    ],
    "intervalData": [
        [
            175.2,
            0,
            700.8,
            0,
            0,
            0,
            0,
            0,
            0,
            1537920900000
        ],
        [
            192,
            0,
            768,
            0,
            0,
            0,
            0,
            0,
            0,
            1537921800000
        ],
        [
            191.39999999999998,
            0,
            765.5999999999999,
            0,
            0,
            0,
            0,
            0,
            0,
            1537922700000
        ]
    ]
}

I need to create separate columns of what's inside intervalMetaData, and then filling those columns with values from intervalData. Is it possible?

CodePudding user response:

You bet it's possible! It's as simple as this:

df = pd.DataFrame(j['intervalData'], columns=j['intervalMetaData'])

CodePudding user response:

If I understand correctly, you just have to set your columns right by importing your list of lists with pandas:

import pandas as pd

data = {
    "_id": "2026",
    "dataDate": 1537920000000,
    "dataYear": 2018,
    "groupId": "1378",
    "HourConsumed": 19781.4,
    "HourGenerated": 0,
    "max": 4658.400000000001,
    "maxGen": 0,
    "maxTime": 1538001000000,
    "avg": -206.05625,
    "max": 0,
    "maxGen": 0,
    "maxTime": None,
    "avgTemp": 0,
    "me_Id": "2004506_3166155129",
    "interval": 15,
    "intervalMetaData": [
        "whC",
        "whG",
        "max",
        "maxGen",
        "hC",
        "hG",
        "maxVar",
        "maxGen",
        "avgTemp",
        "eventTime"
    ],
    "intervalData": [
        [
            175.2,
            0,
            700.8,
            0,
            0,
            0,
            0,
            0,
            0,
            1537920900000
        ],
        [
            192,
            0,
            768,
            0,
            0,
            0,
            0,
            0,
            0,
            1537921800000
        ],
        [
            191.39999999999998,
            0,
            765.5999999999999,
            0,
            0,
            0,
            0,
            0,
            0,
            1537922700000
        ]
    ]
}


df = pd.DataFrame(data["intervalData"], columns=data["intervalMetaData"])
print(df)

Output:

     whC  whG    max  maxGen  hC  hG  maxVar  maxGen  avgTemp      eventTime
0  175.2    0  700.8       0   0   0       0       0        0  1537920900000
1  192.0    0  768.0       0   0   0       0       0        0  1537921800000
2  191.4    0  765.6       0   0   0       0       0        0  1537922700000
  • Related