I have this json output file:
I have this data frame that has server info and config data to call api:
print(hyperEntitydf)
ServerName Env EnvId HostId
server123 prod 00000000000123 111111111 hhhh0000
server000 uat 000000000005555 2222222 y00000000
for each server, I need to call a api to get data per server as below:
for i in range(0, len(hyperEntitydf)):
try:
#api url
processUrl="example.com"/ hyperEntitydf.iloc[i]['EnvId'] "/api/v2/metrics/query?metricSelector=" metric ":max&resolution=inf&from=" from_epoch "&to=" to_epoch "&entitySelector=Type("PROCESS_GROUP_INSTANCE"),fromRelationships.isProcessof(entityId("" hyperEntitydf.iloc[i]['HostId'] ""))"
resp1 = requests.get(processUrl, verify=False, headers={'Authorization': "Api-Token " hyperEntitydf.iloc[i]['Token']}).json()
for r in resp1['result']:
metricId = r['metricId']
for d in r['data']:
dimension = d['dimensions'][0]
timestamps = d['timestamps']
values = d['values']
for t, v in zip(timestamps, values):
data.append({'metricId': metricId, 'dimensions': dimension, 'timestamps': t, 'values': v})
df = pd.DataFrame.from_records(data)
df.columns=['Metric','ProcId','TimeStamp','Value']
df['Server']=hyperEntitydf.iloc[i]['ServerName']
finalDF = finalDF.append(df)
except Exception as e:
print(e)
##respn1 out is this: print(resp1)
{
"totalCount": 39,
"nextPageKey": null,
"resolution": "Inf",
"result": [
{
"metricId": "builtin:tech.generic.cpu.usage:max",
"data": [
{
"dimensions": [
"PROCESS_GROUP_123"
],
"dimensionMap": {
"dt.entity.process_group_instance": "PROCESS_GROUP_123"
},
"timestamps": [
1647272580000
],
"values": [
1.3535032272338867
]
},
{
"dimensions": [
"PROCESS_GROUP_0000"
],
"dimensionMap": {
"dt.entity.process_group_instance": "PROCESS_GROUP_0000"
},
"timestamps": [
1647272580000
],
"values": [
0.0390625
]
},
{
"dimensions": [
"PROCESS_GROUP_5555"
],
"dimensionMap": {
"dt.entity.process_group_instance": "PROCESS_GROUP_5555"
},
"timestamps": [
1647272580000
],
"values": [
0
]
}
]
}
]
}
Once I execute this piece of code, it looks like I am duplicating data per each server.
My data frame needs to look like this:
Metric ProcId TimeStamp Value Server
builtin:tech.generic.cpu.usage:max PROCESS_GROUP_123 1647272580000 1.501560092 server123
builtin:tech.generic.cpu.usage:max PROCESS_GROUP_0000 1647272580000 0.0390625 server123
as suggested by user @not_speshal. I have tried to parse the resp1 output as follows:
df4 = (pd.json_normalize(resp1["result"],
record_path="data",
meta="metricId")
.explode(["dimensions", "timestamps", "values"])
.drop("dimensions", axis=1)
)
df4.columns = ["Timestamp", "Value", "ProcId", "Metric"]
print(df4)
I get this error:
ValueError Traceback (most recent call last)
<ipython-input-67-6c175c457389> in <module>
----> 1 df4 = (pd.json_normalize(resp1["result"],
2 record_path="data",
3 meta="metricId")
4 .explode(["dimensions", "timestamps", "values"])
5 .drop("dimensions", axis=1)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in explode(self, column, ignore_index)
7276 """
7277 if not (is_scalar(column) or isinstance(column, tuple)):
-> 7278 raise ValueError("column must be a scalar")
7279 if not self.columns.is_unique:
7280 raise ValueError("columns must be unique")
ValueError: column must be a scalar
Any ideas what I am doing wrong here?
CodePudding user response:
IIUC, parse each json within your loop with json_normalize
:
df = (pd.json_normalize(resp1["result"],
record_path="data",
meta="metricId")
.drop("dimensions", axis=1)
)
df.columns = ["Timestamp", "Value", "ProcId", "Metric"]
>>> df
Timestamp Value ProcId \
0 [1647272580000] [1.3535032272338867] PROCESS_GROUP_123
1 [1647272580000] [0.0390625] PROCESS_GROUP_0000
2 [1647272580000] [0] PROCESS_GROUP_5555
Metric
0 builtin:tech.generic.cpu.usage:max
1 builtin:tech.generic.cpu.usage:max
2 builtin:tech.generic.cpu.usage:max