Home > Net >  Convert json nested dictionary to pandas dataframe
Convert json nested dictionary to pandas dataframe

Time:12-06

My API generates a JSON file as output with the following structure:

[
{
    "Data": {
        "SensorId": "EF00BFC0",
        "SensorName": "DEVICE 05",
        "TimeStamp": 3877552823,
    },
    "ThresholdValue": {
        "SensorSubId": 1,
        "ThresholdMax": -1,
        "ThresholdMin": -1
    },
    "startUTC": 1668564000,
    "endUTC": 1668663000
},
{
    "Data": {
        "SensorId": "EF00BFC0",
        "SensorName": "DEVICE 05",
        "TimeStamp": 3877553446,
    },
    "ThresholdValue": {
        "SensorSubId": 1,
        "ThresholdMax": -1,
        "ThresholdMin": -1
    },
    "startUTC": 1668564000,
    "endUTC": 1668663000
}]

I want to generate a Pandas DataFrame with SensorID, SensorName, Timestamp, startUTC, and endUTC as columns. I tried it using the below code.

df_nested_list = pd.json_normalize(
    data, 
    record_path =['Data'], 
    meta=['startUTC', 'endUTC']
)

But it gives the column names as rows and generate the dataframe. How can I do this in pandas?

CodePudding user response:

You can use pd.json_normalize():

import pandas as pd

data = [
{
    "Data": {
        "SensorId": "EF00BFC0",
        "SensorName": "DEVICE 05",
        "TimeStamp": 3877552823,
    },
    "ThresholdValue": {
        "SensorSubId": 1,
        "ThresholdMax": -1,
        "ThresholdMin": -1
    },
    "startUTC": 1668564000,
    "endUTC": 1668663000
},
{
    "Data": {
        "SensorId": "EF00BFC0",
        "SensorName": "DEVICE 05",
        "TimeStamp": 3877553446,
    },
    "ThresholdValue": {
        "SensorSubId": 1,
        "ThresholdMax": -1,
        "ThresholdMin": -1
    },
    "startUTC": 1668564000,
    "endUTC": 1668663000
}]
df = pd.json_normalize(data)
df.columns = [c.replace('Data.', '') for c in df.columns]
cols_to_keep = [c for c in df.columns if 'ThresholdValue' not in c]
df = df[cols_to_keep]
df.head()

Output:

startUTC endUTC SensorId SensorName TimeStamp
0 1668564000 1668663000 EF00BFC0 DEVICE 05
1 1668564000 1668663000 EF00BFC0 DEVICE 05

CodePudding user response:

See if this helps,

import pandas as pd

data = [
{
    "Data": {
        "SensorId": "EF00BFC0",
        "SensorName": "DEVICE 05",
        "TimeStamp": 3877552823,
    },
    "ThresholdValue": {
        "SensorSubId": 1,
        "ThresholdMax": -1,
        "ThresholdMin": -1
    },
    "startUTC": 1668564000,
    "endUTC": 1668663000
}]

formatted_data = [
    {
        "SensorId": item["Data"]["SensorId"],
        "SensorName": item["Data"]["SensorName"],
        "Timestamp": item["Data"]["TimeStamp"],
        "startUTC": item["startUTC"],
        "endUTC": item["endUTC"]
    }
    for item in data
]

df = pd.DataFrame(formatted_data)
print(df)
  • Related