Home > Software design >  how to retrieve data from json file using python
how to retrieve data from json file using python

Time:05-25

I'm doing api requests to get json file to be parsed and converted into data frames. Json file sometimes may have empty fields, I am posting 2 possible cases where 1st json fill have the field I am looking for and the 2nd json file has that field empty.

1st json file:

print(resp2)

{
  "entityId": "proc_1234",
  "displayName": "oracle12",
  "firstSeenTms": 1639034760000,
  "lastSeenTms": 1650386100000,
  "properties": {
    "detectedName": "oracle.sysman.gcagent.tmmain.TMMain",
    "bitness": "64",
    "jvmVendor": "IBM",
    "metadata": [
      {
        "key": "COMMAND_LINE_ARGS",
        "value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0"
      },
      {
        "key": "EXE_NAME",
        "value": "java"
      },
      {
        "key": "EXE_PATH",
        "value": "/usr/local/oracle/oem/agent*c/agent_*/oracle_common/jdk/bin/java"
      },
      {
        "key": "JAVA_MAIN_CLASS",
        "value": "oracle.sysman.gcagent.tmmain.TMMain"
      },
      {
        "key": "EXE_PATH",
        "value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0/oracle_common/jdk/bin/java"
      }
    ]
   }
}

2nd Json file:

print(resp2)

{
  "entityId": "PROCESS_GROUP_INSTANCE-FB8C65551916D57D",
  "displayName": "Windows System",
  "firstSeenTms": 1619147697131,
  "lastSeenTms": 1653404640000,
  "properties": {
    "detectedName": "Windows System",
    "bitness": "32",
    "metadata": [],
    "awsNameTag": "Windows System",
    "softwareTechnologies": [
      {
        "type": "WINDOWS_SYSTEM"
      }
    ],
    "processType": "WINDOWS_SYSTEM"
 
  }
}

as you can see metadata": [] empty.

I need to extract entityId, detectedName and if metada has data, I need to get EXE_NAME and EXE_PATH. if metada section is empty, I still need to get the entityId and detectedName from this json file and form a data frame.

so, I have done this:

        #retrieve the detecteName value from the json
        det_name = list(resp2.get('properties','detectedName').values())[0]
        #retrieve EXE_NAME, EXE_PATH and entityId from the json. This part works when metata section has data
        Procdf=(pd.json_normalize(resp2, record_path=['properties', 'metadata'], meta=['entityId']).drop_duplicates(subset=['key']).query("key in ['EXE_NAME','EXE_PATH']").assign(detectedName=det_name).pivot('entityId', 'key', 'value').reset_index())
       #Add detectedName to the Procdf data frame
        Procdf["detectedName"] = det_name

this above code snippet works when metadata has data, if it has no data [], I still need to create a data frame with entityId, detectedName and EXE_NAME and EXE_PATH being empty.

how can I do this? Right now when metadat[], I get this error name 'key' is not defined and skipps that json.

CodePudding user response:

Why not create a new dict based on whether there's value for metadata or not?

Here's an example (this should work with both response types):

import pandas as pd


def find_value(response: dict, key: str) -> str:
    result = [
      x['value'] for x in response['properties']['metadata']
      if x['key'] == key
    ]
    return result[0] if result else ""


def get_values(response: dict) -> dict:
    return {
              "entityId": response['entityId'],
              "displayName": response['displayName'],
              "EXE_NAME": find_value(response, 'EXE_NAME'),
              "EXE_PATH": find_value(response, 'EXE_PATH'),
            }


sample_response = {
    "entityId": "PROCESS_GROUP_INSTANCE-FB8C65551916D57D",
    "displayName": "Windows System",
    "firstSeenTms": 1619147697131,
    "lastSeenTms": 1653404640000,
    "properties": {
        "detectedName": "Windows System",
        "bitness": "32",
        "metadata": [],
        "awsNameTag": "Windows System",
        "softwareTechnologies": [
            {
                "type": "WINDOWS_SYSTEM"
            }
        ],
        "processType": "WINDOWS_SYSTEM"

    }
}

print(pd.json_normalize(get_values(sample_response)))

Sample output for metadata being empty:

                                  entityId     displayName EXE_NAME EXE_PATH
0  PROCESS_GROUP_INSTANCE-FB8C65551916D57D  Windows System            

And one when metadata carries, well, data:

    entityId  ...                                           EXE_PATH
0  proc_1234  ...  /usr/local/oracle/oem/agent*c/agent_*/oracle_c...

CodePudding user response:

Just check if metadata is empty if resp2.get("metadata") == []. If it is then just do the following:

Procdf = pd.DataFrame(columns=["entityId", "detectedName", "EXE_NAME", "EXE_PATH"]).

If it isn't then use your code.

  • Related