Home > OS >  how do you parse nested JSON using python
how do you parse nested JSON using python

Time:03-15

I have this json output file:

I have this data frame that has server info and config data to call api:

print(hyperEntitydf)

ServerName      Env                                 EnvId         HostId 
server123       prod  00000000000123                111111111      hhhh0000
server000       uat  000000000005555                2222222        y00000000

for each server, I need to call a api to get data per server as below:

for i in range(0, len(hyperEntitydf)):
    try:
        #api url
        processUrl="example.com"/ hyperEntitydf.iloc[i]['EnvId'] "/api/v2/metrics/query?metricSelector=" metric ":max&resolution=inf&from=" from_epoch "&to=" to_epoch "&entitySelector=Type("PROCESS_GROUP_INSTANCE"),fromRelationships.isProcessof(entityId("" hyperEntitydf.iloc[i]['HostId'] ""))"
        resp1 = requests.get(processUrl, verify=False,  headers={'Authorization': "Api-Token "   hyperEntitydf.iloc[i]['Token']}).json()
        for r in resp1['result']:
            metricId = r['metricId']
            for d in r['data']:
                dimension = d['dimensions'][0] 
                timestamps = d['timestamps']
                values = d['values']
                for t, v in zip(timestamps, values):
                    data.append({'metricId': metricId, 'dimensions': dimension,  'timestamps': t, 'values': v})
                    df = pd.DataFrame.from_records(data)
                    
                    df.columns=['Metric','ProcId','TimeStamp','Value']
                    df['Server']=hyperEntitydf.iloc[i]['ServerName']
                    finalDF = finalDF.append(df)
                    
    except Exception as e:
        print(e)

##respn1 out is this: print(resp1)

{
  "totalCount": 39,
  "nextPageKey": null,
  "resolution": "Inf",
  "result": [
    {
      "metricId": "builtin:tech.generic.cpu.usage:max",
      "data": [
        {
          "dimensions": [
            "PROCESS_GROUP_123"
          ],
          "dimensionMap": {
            "dt.entity.process_group_instance": "PROCESS_GROUP_123"
          },
          "timestamps": [
            1647272580000
          ],
          "values": [
            1.3535032272338867
          ]
        },
        {
          "dimensions": [
            "PROCESS_GROUP_0000"
          ],
          "dimensionMap": {
            "dt.entity.process_group_instance": "PROCESS_GROUP_0000"
          },
          "timestamps": [
            1647272580000
          ],
          "values": [
            0.0390625
          ]
        },
        {
          "dimensions": [
            "PROCESS_GROUP_5555"
          ],
          "dimensionMap": {
            "dt.entity.process_group_instance": "PROCESS_GROUP_5555"
          },
          "timestamps": [
            1647272580000
          ],
          "values": [
            0
          ]
        }
      ]
    }
  ]
}

Once I execute this piece of code, it looks like I am duplicating data per each server.

My data frame needs to look like this:

Metric                              ProcId              TimeStamp       Value       Server
builtin:tech.generic.cpu.usage:max  PROCESS_GROUP_123   1647272580000   1.501560092 server123
builtin:tech.generic.cpu.usage:max  PROCESS_GROUP_0000  1647272580000   0.0390625   server123

as suggested by user @not_speshal. I have tried to parse the resp1 output as follows:

df4 = (pd.json_normalize(resp1["result"], 
                       record_path="data", 
                       meta="metricId")
        .explode(["dimensions", "timestamps", "values"])
        .drop("dimensions", axis=1)
      )
df4.columns = ["Timestamp", "Value", "ProcId", "Metric"]
print(df4)

I get this error:

ValueError                                Traceback (most recent call last)
<ipython-input-67-6c175c457389> in <module>
----> 1 df4 = (pd.json_normalize(resp1["result"], 
      2                        record_path="data",
      3                        meta="metricId")
      4         .explode(["dimensions", "timestamps", "values"])
      5         .drop("dimensions", axis=1)

~\Anaconda3\lib\site-packages\pandas\core\frame.py in explode(self, column, ignore_index)
   7276         """
   7277         if not (is_scalar(column) or isinstance(column, tuple)):
-> 7278             raise ValueError("column must be a scalar")
   7279         if not self.columns.is_unique:
   7280             raise ValueError("columns must be unique")

ValueError: column must be a scalar

Any ideas what I am doing wrong here?

CodePudding user response:

IIUC, parse each json within your loop with json_normalize:

df = (pd.json_normalize(resp1["result"], 
                       record_path="data", 
                       meta="metricId")
        .drop("dimensions", axis=1)
      )
df.columns = ["Timestamp", "Value", "ProcId", "Metric"]

>>> df
         Timestamp                 Value              ProcId  \
0  [1647272580000]  [1.3535032272338867]   PROCESS_GROUP_123   
1  [1647272580000]           [0.0390625]  PROCESS_GROUP_0000   
2  [1647272580000]                   [0]  PROCESS_GROUP_5555   

                               Metric  
0  builtin:tech.generic.cpu.usage:max  
1  builtin:tech.generic.cpu.usage:max  
2  builtin:tech.generic.cpu.usage:max  
  • Related