I am a complete novice with python so any help or pointers is appreciated. I have an input .csv file that looks like this for ~ 500,000 rows of data:
dwelling,wall,weather,occ
5,2,Ldn,Pen
5,4,Ldn,Pen
3,4,Ldn,Pen
For each combination of input variables, there is a folder with 'results' for that combination. I want to route to each folder depending on the inputs. I thought of using a look up .csv file to match the inputs with the output folder like the following:
dwelling,wall,weather,occ,folder
5,2,Ldn,Pen,Semi_detached_solid
5,4,Ldn,Pen,Semi_detached_cavity
3,4,Ldn,Pen,Detached_cavity
But I'm not sure where to even start. Note that the input file needs to be dynamic and so I can't just add on another column with the folder name (I don't think so anyway).
EDIT: there are additional columns (of continuous data) in the input.csv, for example:
dwelling,wall,weather,occ,height,temp
5,2,Ldn,Pen,154.7,23.4
5,4,Ldn,Pen,172.4,28.7
3,4,Ldn,Pen,183.5,21,2
These additional values need to be routed to the output folder, but they don't need to be in the look_up.csv (as matching is only done on categorical variables).
CodePudding user response:
There's several ways to go about this, but I would also use some kind of mapping as well.
If you read a mapping CSV file, which has the columns you need to route by, and a target folder, like in your example, I would use a dict for key/value lookups. (Dicts are very fast).
Logic would be to turn the columns you need to route by into a key, and use the target folder as the value.
Then you iterate over your actual CSV, also turn the relevant columns into a key and use that to lookup the target folder.
I've used the samples you provided as input files below
import csv
def csv_dict_to_key(row: dict) -> str:
# convert the row values into a key
return f"{row['dwelling']}_{row['wall']}_{row['weather']}_{row['occ']}"
def create_mapping(filename: str) -> dict:
# go over the mapping file and return a dict
# which has the mapping
result = {}
with open(filename) as infile:
reader = csv.DictReader(infile)
for row in reader:
key = csv_dict_to_key(row)
if key in result:
print(f"FOUND DUPLICATE {key}")
result[key] = row["folder"]
return result
mapping = create_mapping("mapping.csv")
with open("data.csv") as infile:
reader = csv.DictReader(infile)
for row in reader:
key = csv_dict_to_key(row)
if key in mapping:
# do whatever you need here, I'm just using pring as example
print(f"----\nROW: {row}\nKEY: {key}\nTARGET: {mapping[key]}")
else:
print(f"----\nKEY {key} not found in mapping")
output
----
ROW: {'dwelling': '5', 'wall': '2', 'weather': 'Ldn', 'occ': 'Pen', 'height': '154.7', 'temp': '23.4'}
KEY: 5_2_Ldn_Pen
TARGET: Semi_detached_solid
----
ROW: {'dwelling': '5', 'wall': '4', 'weather': 'Ldn', 'occ': 'Pen', 'height': '172.4', 'temp': '28.7'}
KEY: 5_4_Ldn_Pen
TARGET: Semi_detached_cavity
----
ROW: {'dwelling': '3', 'wall': '4', 'weather': 'Ldn', 'occ': 'Pen', 'height': '183.5', 'temp': '21', None: ['2']}
KEY: 3_4_Ldn_Pen
TARGET: Detached_cavity