Home > Back-end >  Loading CSV into nested dictionary to pull out dynamic values
Loading CSV into nested dictionary to pull out dynamic values

Time:07-21

I get a CSV from our developers that looks like this (example values only)

Category, Sub-Category, Template Name
Health Check,CPU,checkCPU
Health Check,Memory,checkMemory
Service Request,Reboot Device, rebootDevice
Service Request,Check CPU,checkCPU-SR

I need my python script to be able to read this (values will grow and change over time) and give me the Template Name for a given Category and Sub-Category. I can make this work by reading in the CSV and looping through it, searching for the values I want, but it seems like there has to be an easier way.

If I load a JSON file like this, I can use json.load to convert things to a dict, then easily retrieve the value I want without having to loop through things.

{
    "Health Check": {
        "CPU": "checkCPU",
        "Memory": "checkMemory"
    },
    "Service Request": {
        "Reboot Device": "rebootDevice",
        "Check CPU": "checkCPU-SR"
    }
}

Then I use something like

import json
import csv

with open('categories.json','r') as f:
    myDict = json.load(f)
    print(myDict["Health Check"]["CPU"])

I'd much rather use the dict method, but I don't know if there's a way to achieve this from a CSV file. I've tried a few things like csv.dictreader or Pandas, but I can't get either of them to work. Pandas I could setup a key, but none of the values here are unique. With csv.dictreader, the nested fashion of this data (multiple keys/values under a single heading like Health Check) don't seem to work.

CodePudding user response:

If you use pandas then you don't need loop and you don't have to convert to JSON.

all_results = df[ (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU") ]["Template Name"] 

or more readable

mask = (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU")

all_results = df[mask]["Template Name"]

Miniamal working code.

I use io only to simulate file - so everyone can simply copy and test it - but you should use filename

text = '''Category, Sub-Category, Template Name
Health Check,CPU,checkCPU
Health Check,Memory,checkMemory
Service Request,Reboot Device, rebootDevice
Service Request,Check CPU,checkCPU-SR'''

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text), sep=',\s*')  # I use `,\s*` to remove spaces after `,`
print(df)
print('---')

mask = (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU")

all_results = df[mask]["Template Name"]
print(all_results[0])

Result:

          Category   Sub-Category Template Name
0     Health Check            CPU      checkCPU
1     Health Check         Memory   checkMemory
2  Service Request  Reboot Device  rebootDevice
3  Service Request      Check CPU   checkCPU-SR
---
checkCPU

Using pandas you can easily select all items which can match some values - ie. all items which have substring CPU in Sub-Category

mask = df["Sub-Category"].str.contains("CPU")

all_results = df[mask]

for index, item in all_results.iterrows():
    print(item['Category'], '|', item["Sub-Category"], '|', item["Template Name"])

Result:

Health Check | CPU | checkCPU
Service Request | Check CPU | checkCPU-SR

CodePudding user response:

import json


def convert():
    output_dict = {}
    with open("<file_name>", "r") as source:
        file_lines = source.read().splitlines()
        file_lines.pop(0)  # Remove header line as we don't need it
    for line in file_lines:
        line_contents = line.split(",")  # CSV so split the line on ','
        if line_contents[0] not in output_dict:  # If first key isn't in output_dict, make it as a dict
            output_dict[line_contents[0]] = {}
        if line_contents[1] not in output_dict[line_contents[0]]:  # If second key isn't a sub-key of the first-key in output_dict,  make it a dict
            output_dict[line_contents[0]][line_contents[1]] = {}
        output_dict[line_contents[0]][line_contents[1]][line_contents[2]] = line_contents[3]  # Now add the entry based on the top two key and sub-key

    return output_dict  # return the dict


if __name__ == "__main__":
    print(json.dumps(convert(), indent=4))
  • Related