I have this JSON file:
print(resp2)
{
"entityId": "process_id_1234",
"displayName": "splunkd - splunkd",
"firstSeenTms": 1612578480000,
"lastSeenTms": 1650316020000,
"properties": {
"detectedName": "splunkd",
"bitness": "64",
"metadata": [
{
"key": "EXE_NAME",
"value": "splunkd"
},
{
"key": "EXE_PATH",
"value": "/opt/bin/splunkd"
}
]
}
}
2nd json example: In some cases, there are multiple key and value entries. I need the first occurence of key:EXE_Name, value: and key:EXE_PATH, value pairs. Is this possible?
{
"entityId": "proc_1234",
"displayName": "oracle12",
"firstSeenTms": 1639034760000,
"lastSeenTms": 1650386100000,
"properties": {
"detectedName": "oracle.sysman.gcagent.tmmain.TMMain",
"bitness": "64",
"jvmVendor": "IBM",
"metadata": [
{
"key": "COMMAND_LINE_ARGS",
"value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0"
},
{
"key": "EXE_NAME",
"value": "java"
},
{
"key": "EXE_PATH",
"value": "/usr/local/oracle/oem/agent*c/agent_*/oracle_common/jdk/bin/java"
},
{
"key": "JAVA_MAIN_CLASS",
"value": "oracle.sysman.gcagent.tmmain.TMMain"
},
{
"key": "EXE_PATH",
"value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0/oracle_common/jdk/bin/java"
}
]
}
}
I need to get entityId
and under metadata
section, I need to pull EXE_NAME
and EXE_PATH
and form a data frame that needs to look like this:
entityId EXE_NAME EXE_PATH
process_id_1234 splunkd /opt/bin/splunkd
I have tried this:
procList = []
pd.json_normalize(resp2)
I also tried this:
pd.json_normalize(resp2, record_path='properties.metadata', meta=['key', 'value'])
But it gave a different result than I wanted.
CodePudding user response:
Since you say you want to build a pandas DataFrame, you can use json_normalize
to build a DataFrame and pivot
it:
df = (pd.json_normalize(resp2, record_path=['properties', 'metadata'],
meta=['entityId'])
.pivot('entityId', 'key', 'value').reset_index())
Output:
key entityId EXE_NAME EXE_PATH
0 process_id_1234 splunkd /opt/bin/splunkd
Edit:
In the case of multiple "EXE_NAME"s or "EXE_PATH"s, you could use drop_duplicates
before pivot
:
out = (pd.json_normalize(resp2, record_path=['properties', 'metadata'],
meta=['entityId'])
.drop_duplicates(subset=['key'])
.query("key in ['EXE_NAME','EXE_PATH']") # you could remove this line if you want the non-EXE columns as well.
.pivot('entityId', 'key', 'value').reset_index()
)
Output:
key entityId EXE_NAME EXE_PATH
0 proc_1234 java /usr/local/oracle/oem/agent*c/agent_*/oracle_c...