Home > Software engineering >  Trying to cross reference one CSV to another
Trying to cross reference one CSV to another

Time:07-31

So I have two CSV files, one with a list of names that might be a match or might not match the list of names in another file. But if it does find a match I want it to go back to the original csv1 and a different column value from csv2 from the same row as the match to the second row in csv1.

with open('/Users/{}/Desktop/{}.csv'.format(os.getlogin(), mainFile), mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
for i in csv_reader:
    with open('/Users/{}/Desktop/{}.csv'.format(os.getlogin(), projects), mode='r') as new_csv:
        csv_reader = csv.DictReader(new_csv)
        for x in csv_reader:
            if i['Owner'] == x['Project Owner']:
                with open('/Users/{}/Desktop/{}.csv'.format(os.getlogin(), mainFile), mode='r') as csv_writing:
                    writer = csv.writer(csv_writing)
                    writer.writerow([x['Project Owner'], x['Project Name']])

this is what I have so far but I'm worried that it won't know what to do with the rows in csv1 that don't match and since I can loop a writer the best option would be to delete the row from csv1 that doesn't match right? So how would I delete that said row? Or a better solution if you have one.

CodePudding user response:

If you have a common key, you can read both CSVs into a DictReader, convert them to a list of OrderedDicts, then merge the dictionaries based on that common key.

Example:

csv1.csv:

name,food,title
joe,chicken,manager
mike,fish,employee
tom,beef,supervisor
gary,veggie,owner

csv2.csv:

name,city
joe,washington
mike,new york
tom,los angeles
gary,philadelphia

Python Code:

import csv

with open("csv1.csv") as csv1:
    csv1 = list(csv.DictReader(csv1))   # csv1.csv as list of OrderedDict
    with open("csv2.csv") as csv2:
        csv2 = list(csv.DictReader(csv2))   # csv2.csv as list of OrderedDict
        matched = []    # list to hold merged OrderedDicts
        for r1 in csv1: # for each OrderedDict in csv1
            for r2 in csv2: # and each OrderedDict in csv2
                if r1["name"] == r2["name"]:  # if they have the "name" key in common
                    entry = r1.copy()   # merge the OrderedDicts
                    entry.update(r2)
                    matched.append(entry)   # and add them to the list
        print(matched)  # print the new list of OrderedDicts 

CodePudding user response:

There's a Table helper in convtools library (github). The helper simplifies basic operations for stream processing table-like data.

from convtools.contrib.tables import Table
from convtools import conversion as c

Table.from_csv("tmp/main.csv", header=True).join(
    Table.from_csv("tmp/projects.csv", header=True),
    on=c.LEFT.col("Owner") == c.RIGHT.col("Project Owner"),
    how="inner",
).take("Project Owner", "Project Name").into_csv("tmp/results.csv")
  • Related