How to select, map and count data from JSON API with Python?-CodePudding

I am new to Python and am struggling to find the right method for the following:

I have 2 API responses, one is a list of devices, the other one is a list of organizations. Each device is linked to an organization with an Organization ID.

organizations = [
                 {
                  'name': 'Aperture Science Inc.', 
                  'description': 'Just a corporation!', 
                  'id': 1
                 }, 
                 {
                  'name': 'Software Development Inc', 
                  'description': "Making the world's next best app!", 
                  'id': 2
                 }
                ]

devices = [
           {
            'id': 1, 
            'organizationId': 2, 
            'nodeClass': 'WINDOWS_WORKSTATION', 
            'displayName': 'DESKTOP_01'
            },{
            'id': 2, 
            'organizationId': 2, 
            'nodeClass': 'WINDOWS_SERVER', 
            'displayName': 'SERVER_01'
            },{
            'id': 3, 
            'organizationId': 1, 
            'nodeClass': 'WINDOWS_WORSTATION', 
            'displayName': 'DESKTOP_0123'
            }
           ]

The OrganizationID in devices = the id in organizations. I want to get a result with the number of Servers and workstations respectively for each organizations, like this:

results = [
           { 
            'Organization Name' : 'Aperture Science Inc.', 
            'Number of Workstations': 1, 
            'Number of Servers': 0,
            'Total devices': 1
           }, 
           { 
            'Organization Name' : 'Software Development Inc', 
            'Number of Workstations': 1, 
            'Number of Servers': 1,
            'Total devices': 2
           }

I started with this

wks_sum = sum(d.nodeClass == "WINDOWS_WORKSTATION" for d in devices)
print(wks_sum)

but I get this error:

AttributeError: 'dict' object has no attribute 'nodeClass'

and at the very end I convert and save in a csv file:

df = pd.DataFrame(results)
df.to_csv('results.csv', index=False)

I am struggling doing the count of each device types and also to map devices to the right organization name and would really appreciate some help :)

EDIT:

Thanks to @Vincent, I could come up with:

    for device in devices:
    for organization in organizations:
        organization["workstations"] = organization.get("workstations", [])
        organization["servers"] = organization.get("servers", [])
        if device["organizationId"] != organization["id"]:
            continue
        if device["nodeClass"].__eq__("WINDOWS_SERVER"):
            organization["servers"].append(device["nodeClass"])
        elif device["nodeClass"].__eq__("WINDOWS_WORKSTATION"):
            organization["workstations"].append(device["nodeClass"])
        break

results = [
    {
        "Organization Name": organization["name"],
        "Number of Workstations": len(organization["workstations"]),
        "Number of Servers": len(organization["servers"]),
        "Total devices": len(organization["workstations"]   organization["servers"]),
    } for organization in organizations
]


# print(f"{results = }")
print(results)


# convert and save in a csv file

df = pd.DataFrame(results)
df.to_csv('results.csv', index=False)

CodePudding user response：

This code will achieve you goal:

organizations = [
     {
      'name': 'Aperture Science Inc.', 
      'description': 'Just a corporation!', 
      'id': 1
     }, 
     {
      'name': 'Software Development Inc', 
      'description': "Making the world's next best app!", 
      'id': 2
     }
]

devices = [
    {
    'id': 1, 
    'organizationId': 2, 
    'nodeClass': 'WINDOWS_WORKSTATION', 
    'displayName': 'DESKTOP_01'
    },{
    'id': 2, 
    'organizationId': 2, 
    'nodeClass': 'WINDOWS_SERVER', 
    'displayName': 'SERVER_01'
    },{
    'id': 3, 
    'organizationId': 1, 
    'nodeClass': 'WINDOWS_WORSTATION', 
    'displayName': 'DESKTOP_0123'
    }
]

for device in devices:
    for organization in organizations:
        organization["workstations"] = organization.get("workstations", [])
        organization["servers"] = organization.get("servers", [])
        if device["organizationId"] != organization["id"]:
            continue
        if device["displayName"].startswith("SERVER_"):
            organization["servers"].append(device["nodeClass"])
        elif device["displayName"].startswith("DESKTOP_"):
            organization["workstations"].append(device["nodeClass"])
        break
        
results = [
    {
        "Organization Name": organization["name"],
        "Number of Workstations": len(organization["workstations"]),
        "Number of Servers": len(organization["servers"]),
        "Total devices": len(organization["workstations"]   organization["servers"]),
    } for organization in organizations
]


print(f"{results = }")

Result:

[{'Organization Name': 'Aperture Science Inc.', 'Number of Workstations': 1, 'Number of Servers': 0, 'Total devices': 1}, {'Organization Name': 'Software Development Inc', 'Number of Workstations': 1, 'Number of Servers': 1, 'Total devices': 2}]

Indeed you can do it using obscure lib such as pandas, but I think a good slow code like this is better to know what is done and easier to modify if needed.

To deal with a huge amount of data, you should dump into two sql tables using sqlite3 for example and deal with SQL.