Home > Blockchain >  How do you get the desired dataframe from this JSON file with json_normalize
How do you get the desired dataframe from this JSON file with json_normalize

Time:04-20

I have this JSON file:

print(resp2)

{
  "entityId": "process_id_1234",
  "displayName": "splunkd - splunkd",
  "firstSeenTms": 1612578480000,
  "lastSeenTms": 1650316020000,
  "properties": {
    "detectedName": "splunkd",
    "bitness": "64",
    "metadata": [
      {
        "key": "EXE_NAME",
        "value": "splunkd"
      },
      {
        "key": "EXE_PATH",
        "value": "/opt/bin/splunkd"
      }
    ]
  }
}

2nd json example: In some cases, there are multiple key and value entries. I need the first occurence of key:EXE_Name, value: and key:EXE_PATH, value pairs. Is this possible?

{
  "entityId": "proc_1234",
  "displayName": "oracle12",
  "firstSeenTms": 1639034760000,
  "lastSeenTms": 1650386100000,
  "properties": {
    "detectedName": "oracle.sysman.gcagent.tmmain.TMMain",
    "bitness": "64",
    "jvmVendor": "IBM",
    "metadata": [
      {
        "key": "COMMAND_LINE_ARGS",
        "value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0"
      },
      {
        "key": "EXE_NAME",
        "value": "java"
      },
      {
        "key": "EXE_PATH",
        "value": "/usr/local/oracle/oem/agent*c/agent_*/oracle_common/jdk/bin/java"
      },
      {
        "key": "JAVA_MAIN_CLASS",
        "value": "oracle.sysman.gcagent.tmmain.TMMain"
      },
      {
        "key": "EXE_PATH",
        "value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0/oracle_common/jdk/bin/java"
      }
    ]
   }
}

I need to get entityId and under metadata section, I need to pull EXE_NAME and EXE_PATH and form a data frame that needs to look like this:

entityId          EXE_NAME    EXE_PATH
process_id_1234   splunkd     /opt/bin/splunkd

I have tried this:

procList = []
pd.json_normalize(resp2)

I also tried this:

pd.json_normalize(resp2, record_path='properties.metadata', meta=['key', 'value'])

But it gave a different result than I wanted.

CodePudding user response:

Since you say you want to build a pandas DataFrame, you can use json_normalize to build a DataFrame and pivot it:

df = (pd.json_normalize(resp2, record_path=['properties', 'metadata'],
                        meta=['entityId'])
     .pivot('entityId', 'key', 'value').reset_index())

Output:

key         entityId  EXE_NAME         EXE_PATH
0    process_id_1234  splunkd  /opt/bin/splunkd

Edit:

In the case of multiple "EXE_NAME"s or "EXE_PATH"s, you could use drop_duplicates before pivot:

out = (pd.json_normalize(resp2, record_path=['properties', 'metadata'],
                         meta=['entityId'])
       .drop_duplicates(subset=['key'])
       .query("key in ['EXE_NAME','EXE_PATH']")    # you could remove this line if you want the non-EXE columns as well.
       .pivot('entityId', 'key', 'value').reset_index()
      )

Output:

key   entityId EXE_NAME                                           EXE_PATH
0    proc_1234     java  /usr/local/oracle/oem/agent*c/agent_*/oracle_c...
  • Related