My API generates a JSON file as output with the following structure:
[
{
"Data": {
"SensorId": "EF00BFC0",
"SensorName": "DEVICE 05",
"TimeStamp": 3877552823,
},
"ThresholdValue": {
"SensorSubId": 1,
"ThresholdMax": -1,
"ThresholdMin": -1
},
"startUTC": 1668564000,
"endUTC": 1668663000
},
{
"Data": {
"SensorId": "EF00BFC0",
"SensorName": "DEVICE 05",
"TimeStamp": 3877553446,
},
"ThresholdValue": {
"SensorSubId": 1,
"ThresholdMax": -1,
"ThresholdMin": -1
},
"startUTC": 1668564000,
"endUTC": 1668663000
}]
I want to generate a Pandas DataFrame with SensorID, SensorName, Timestamp, startUTC, and endUTC as columns. I tried it using the below code.
df_nested_list = pd.json_normalize(
data,
record_path =['Data'],
meta=['startUTC', 'endUTC']
)
But it gives the column names as rows and generate the dataframe. How can I do this in pandas?
CodePudding user response:
You can use pd.json_normalize()
:
import pandas as pd
data = [
{
"Data": {
"SensorId": "EF00BFC0",
"SensorName": "DEVICE 05",
"TimeStamp": 3877552823,
},
"ThresholdValue": {
"SensorSubId": 1,
"ThresholdMax": -1,
"ThresholdMin": -1
},
"startUTC": 1668564000,
"endUTC": 1668663000
},
{
"Data": {
"SensorId": "EF00BFC0",
"SensorName": "DEVICE 05",
"TimeStamp": 3877553446,
},
"ThresholdValue": {
"SensorSubId": 1,
"ThresholdMax": -1,
"ThresholdMin": -1
},
"startUTC": 1668564000,
"endUTC": 1668663000
}]
df = pd.json_normalize(data)
df.columns = [c.replace('Data.', '') for c in df.columns]
cols_to_keep = [c for c in df.columns if 'ThresholdValue' not in c]
df = df[cols_to_keep]
df.head()
Output:
startUTC | endUTC | SensorId | SensorName | TimeStamp |
---|---|---|---|---|
0 | 1668564000 | 1668663000 | EF00BFC0 | DEVICE 05 |
1 | 1668564000 | 1668663000 | EF00BFC0 | DEVICE 05 |
CodePudding user response:
See if this helps,
import pandas as pd
data = [
{
"Data": {
"SensorId": "EF00BFC0",
"SensorName": "DEVICE 05",
"TimeStamp": 3877552823,
},
"ThresholdValue": {
"SensorSubId": 1,
"ThresholdMax": -1,
"ThresholdMin": -1
},
"startUTC": 1668564000,
"endUTC": 1668663000
}]
formatted_data = [
{
"SensorId": item["Data"]["SensorId"],
"SensorName": item["Data"]["SensorName"],
"Timestamp": item["Data"]["TimeStamp"],
"startUTC": item["startUTC"],
"endUTC": item["endUTC"]
}
for item in data
]
df = pd.DataFrame(formatted_data)
print(df)