Home > database >  Pandas Read Nested Json Data
Pandas Read Nested Json Data

Time:10-05

I have this JSON string:

{
    "2022-09-28T00:45:00.000Z": [
        {
            "value": 0.216,
            "plantId": "27050937",
            "man": "pippo"
        },
        {
            "value": 0.224,
            "plantId": "30082443",
            "man": "pippo"
        }
    ],
    "2022-09-28T00:30:00.000Z": [
        {
            "value": 0.248,
            "plantId": "27050937",
            "man": "pippo"
        },
        {
            "value": 0.108,
            "plantId": "30082443",
            "man": "pippo"
        }
    ]
}

I would like to import the data in order to build a numpy matrix indexed as (plantId, time). For example, I would get a matrix like:

  300082443 0.216 0.108
  27050937  0.224 0.248
  ....

I tried with pandas:

a =pd.read_json("./data.json",orient='index')

But I get:

2022-09-28 00:45:00 00:00  {'value': 0.216, 'plantId': '27050937', 'man...  {'value': 0.224, 'plantId': '30082443', 'man...
2022-09-28 00:30:00 00:00  {'value': 0.248, 'plantId': '27050937', 'man...  {'value': 0.108, 'plantId': '30082443', 'man...

CodePudding user response:

Use list comprehension for merge dates (outer keys) to nested dictionaries and pass to DataFrame constructor:

import json

with open('./data.json') as data_file:    
    data = json.load(data_file)  


df = pd.DataFrame([{**{'date': k}, **x} for k, v in data.items() for x in v])
print (df)
                       date  value   plantId    man
0  2022-09-28T00:45:00.000Z  0.216  27050937  pippo
1  2022-09-28T00:45:00.000Z  0.224  30082443  pippo
2  2022-09-28T00:30:00.000Z  0.248  27050937  pippo
3  2022-09-28T00:30:00.000Z  0.108  30082443  pippo

CodePudding user response:

it works for me:

df=pd.DataFrame(a)
df = df.join(pd.json_normalize(df.pop('2022-09-28T00:45:00.000Z')))
df = pd.concat([df,(pd.json_normalize(df.pop('2022-09-28T00:30:00.000Z')))])
df2=df.groupby('plantId')['value'].agg(list).reset_index()
df2[['value_1','value_2']] = pd.DataFrame(df2.value.tolist(), index= df2.index)
print(df2)
'''
    plantId           value  value_1  value_2
0  27050937  [0.216, 0.248]    0.216    0.248
1  30082443  [0.224, 0.108]    0.224    0.108
'''
  • Related