I get a CSV from our developers that looks like this (example values only)
Category, Sub-Category, Template Name
Health Check,CPU,checkCPU
Health Check,Memory,checkMemory
Service Request,Reboot Device, rebootDevice
Service Request,Check CPU,checkCPU-SR
I need my python script to be able to read this (values will grow and change over time) and give me the Template Name for a given Category and Sub-Category. I can make this work by reading in the CSV and looping through it, searching for the values I want, but it seems like there has to be an easier way.
If I load a JSON file like this, I can use json.load to convert things to a dict, then easily retrieve the value I want without having to loop through things.
{
"Health Check": {
"CPU": "checkCPU",
"Memory": "checkMemory"
},
"Service Request": {
"Reboot Device": "rebootDevice",
"Check CPU": "checkCPU-SR"
}
}
Then I use something like
import json
import csv
with open('categories.json','r') as f:
myDict = json.load(f)
print(myDict["Health Check"]["CPU"])
I'd much rather use the dict method, but I don't know if there's a way to achieve this from a CSV file. I've tried a few things like csv.dictreader or Pandas, but I can't get either of them to work. Pandas I could setup a key, but none of the values here are unique. With csv.dictreader, the nested fashion of this data (multiple keys/values under a single heading like Health Check) don't seem to work.
CodePudding user response:
If you use pandas
then you don't need loop
and you don't have to convert to JSON
.
all_results = df[ (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU") ]["Template Name"]
or more readable
mask = (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU")
all_results = df[mask]["Template Name"]
Miniamal working code.
I use io
only to simulate file - so everyone can simply copy and test it - but you should use filename
text = '''Category, Sub-Category, Template Name
Health Check,CPU,checkCPU
Health Check,Memory,checkMemory
Service Request,Reboot Device, rebootDevice
Service Request,Check CPU,checkCPU-SR'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text), sep=',\s*') # I use `,\s*` to remove spaces after `,`
print(df)
print('---')
mask = (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU")
all_results = df[mask]["Template Name"]
print(all_results[0])
Result:
Category Sub-Category Template Name
0 Health Check CPU checkCPU
1 Health Check Memory checkMemory
2 Service Request Reboot Device rebootDevice
3 Service Request Check CPU checkCPU-SR
---
checkCPU
Using pandas
you can easily select all items which can match some values - ie. all items which have substring CPU
in Sub-Category
mask = df["Sub-Category"].str.contains("CPU")
all_results = df[mask]
for index, item in all_results.iterrows():
print(item['Category'], '|', item["Sub-Category"], '|', item["Template Name"])
Result:
Health Check | CPU | checkCPU
Service Request | Check CPU | checkCPU-SR
CodePudding user response:
import json
def convert():
output_dict = {}
with open("<file_name>", "r") as source:
file_lines = source.read().splitlines()
file_lines.pop(0) # Remove header line as we don't need it
for line in file_lines:
line_contents = line.split(",") # CSV so split the line on ','
if line_contents[0] not in output_dict: # If first key isn't in output_dict, make it as a dict
output_dict[line_contents[0]] = {}
if line_contents[1] not in output_dict[line_contents[0]]: # If second key isn't a sub-key of the first-key in output_dict, make it a dict
output_dict[line_contents[0]][line_contents[1]] = {}
output_dict[line_contents[0]][line_contents[1]][line_contents[2]] = line_contents[3] # Now add the entry based on the top two key and sub-key
return output_dict # return the dict
if __name__ == "__main__":
print(json.dumps(convert(), indent=4))