Match input .csv to output folder python-CodePudding

I am a complete novice with python so any help or pointers is appreciated. I have an input .csv file that looks like this for ~ 500,000 rows of data:

dwelling,wall,weather,occ
5,2,Ldn,Pen
5,4,Ldn,Pen
3,4,Ldn,Pen

For each combination of input variables, there is a folder with 'results' for that combination. I want to route to each folder depending on the inputs. I thought of using a look up .csv file to match the inputs with the output folder like the following:

dwelling,wall,weather,occ,folder
5,2,Ldn,Pen,Semi_detached_solid
5,4,Ldn,Pen,Semi_detached_cavity
3,4,Ldn,Pen,Detached_cavity

But I'm not sure where to even start. Note that the input file needs to be dynamic and so I can't just add on another column with the folder name (I don't think so anyway).

EDIT: there are additional columns (of continuous data) in the input.csv, for example:

dwelling,wall,weather,occ,height,temp
5,2,Ldn,Pen,154.7,23.4
5,4,Ldn,Pen,172.4,28.7
3,4,Ldn,Pen,183.5,21,2

These additional values need to be routed to the output folder, but they don't need to be in the look_up.csv (as matching is only done on categorical variables).

CodePudding user response：

There's several ways to go about this, but I would also use some kind of mapping as well.

If you read a mapping CSV file, which has the columns you need to route by, and a target folder, like in your example, I would use a dict for key/value lookups. (Dicts are very fast).

Logic would be to turn the columns you need to route by into a key, and use the target folder as the value.

Then you iterate over your actual CSV, also turn the relevant columns into a key and use that to lookup the target folder.

I've used the samples you provided as input files below

import csv


def csv_dict_to_key(row: dict) -> str:
    # convert the row values into a key
    return f"{row['dwelling']}_{row['wall']}_{row['weather']}_{row['occ']}"


def create_mapping(filename: str) -> dict:
    # go over the mapping file and return a dict
    # which has the mapping
    result = {}
    with open(filename) as infile:
        reader = csv.DictReader(infile)
        for row in reader:
            key = csv_dict_to_key(row)
            if key in result:
                print(f"FOUND DUPLICATE {key}")
            result[key] = row["folder"]
    return result


mapping = create_mapping("mapping.csv")


with open("data.csv") as infile:
    reader = csv.DictReader(infile)
    for row in reader:
        key = csv_dict_to_key(row)
        if key in mapping:
            # do whatever you need here, I'm just using pring as example
            print(f"----\nROW:    {row}\nKEY:    {key}\nTARGET: {mapping[key]}")
        else:
            print(f"----\nKEY {key} not found in mapping")

output

----
ROW:    {'dwelling': '5', 'wall': '2', 'weather': 'Ldn', 'occ': 'Pen', 'height': '154.7', 'temp': '23.4'}
KEY:    5_2_Ldn_Pen
TARGET: Semi_detached_solid
----
ROW:    {'dwelling': '5', 'wall': '4', 'weather': 'Ldn', 'occ': 'Pen', 'height': '172.4', 'temp': '28.7'}
KEY:    5_4_Ldn_Pen
TARGET: Semi_detached_cavity
----
ROW:    {'dwelling': '3', 'wall': '4', 'weather': 'Ldn', 'occ': 'Pen', 'height': '183.5', 'temp': '21', None: ['2']}
KEY:    3_4_Ldn_Pen
TARGET: Detached_cavity