Home > Mobile >  How to merge two csv files based on first column (NO HEADERS, NO PANDAS)
How to merge two csv files based on first column (NO HEADERS, NO PANDAS)

Time:05-09

I have two csv files that I need to merge based on the first column (which is column 0). I cannot give them headers, and I cannot use pandas. Here are the two files

StudentsMajorsList.csv

305671,Jones,Bob,Electrical Engineering,
987621,Wong,Chen,Computer Science,
323232,Rubio,Marco,Computer Information Systems,
564321,Awful,Student,Computer Science,Y
769889,Boy,Sili,Computer Information Systems,Y
156421,McGill,Tom,Electrical Engineering,
999999,Genius,Real,Physics,

GPAList.csv

156421,3.4
305671,3.1
323232,3.8
564321,2.2
769889,3.9
987621,3.85
999999,4

I want the resulting csv file to appear like this

FullRoster.csv

305671,Jones,Bob,Electrical Engineering,3.1
987621,Wong,Chen,Computer Science,3.85
323232,Rubio,Marco,Computer Information Systems,3.8
564321,Awful,Student,Computer Science,Y,2.2
769889,Boy,Sili,Computer Information Systems,Y,3.9
156421,McGill,Tom,Electrical Engineering,3.4
999999,Genius,Real,Physics,4

What code can I use in order to achieve this. Please remember that pandas is not allowed, and I cannot give the files headers to make things easier. I have to use them exactly as they are. Thank you!

EDIT: I APOLOGIZE I did not include the code I have. I don't use this cite often and I didn't familiarize myself with the rules before posting. My apologies! Here's what I have so far, but this code does not work:

with open('StudentsMajorsList.csv','r') as f2:
    reader = csv.reader(f2)
    dict2 = {row[0]: row[1:] for row in reader}

with open('GPAList.csv','r') as f1:
    reader = csv.reader(f1)
    dict1 = OrderedDict((row[0], row[1:]) for row in reader)

result = OrderedDict()
for d in (dict1, dict2):
    for key, value in dict.items():
        result.setdefault(key, []).extend(value)

with open('FullRoster.csv', 'w') as f:
    w = csv.writer(f)
    for key, value in result.items():
        w.writerow([key]   value)

CodePudding user response:

# usage: merge_csv.py <file1> <file2> <output>
# example: merge_csv.py file1.csv file2.csv file3.csv
import csv
import sys


def merge_csv(file1, file2, output):
    with open(file1, 'r') as f1, open(file2, 'r') as f2, open(output, 'w') as f3:
        reader1 = csv.reader(f1)
        reader2 = csv.reader(f2)
        writer = csv.writer(f3)
        for row1, row2 in zip(reader1, reader2):
            if row1[0] == row2[0]:
                writer.writerow(row1   row2[1:])
            elif row1[0] < row2[0]:
                writer.writerow(row1)
            else:
                writer.writerow(row2)


if __name__ == '__main__':
    merge_csv(sys.argv[1], sys.argv[2], sys.argv[3])
    print('done')

CodePudding user response:

This is how I would of done it:

import csv

with open('StudentsMajorsList.csv', newline='') as file:
    reader = csv.reader(file)
    data1 = list(reader)
    
with open('GPAList.csv', newline='') as file:
    reader = csv.reader(file)
    data2 = list(reader)

merge1 = []
merge2 = []
merge3 = []

for list1 in data1:
    for item in list1:
        x = item.split(',')
    merge1.append(x)
        
for list2 in data2:
    for item in list2:
        x = item.split(',')
    merge2.append(x)
    
for i in range(len(merge1)):
    for j in range(len(merge2)):
        if(merge1[i][0] == merge2[j][0]):
            merge3.append(merge1[i][0:])
            merge3[i].append(merge2[j][1])

for item in merge3:
    for i in item:
        if (i == ''):
            item.remove(i)
            
for item in range(len(merge3)):
    print(merge3[item])
    
with open('FullRoster.csv', 'w') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerows(merge3)

Output:

['305671', 'Jones', 'Bob', 'Electrical Engineering', '3.1']
['987621', 'Wong', 'Chen', 'Computer Science', '3.85']
['323232', 'Rubio', 'Marco', 'Computer Information Systems', '3.8']
['564321', 'Awful', 'Student', 'Computer Science', 'Y', '2.2']
['769889', 'Boy', 'Sili', 'Computer Information Systems', 'Y', '3.9']
['156421', 'McGill', 'Tom', 'Electrical Engineering', '3.4']
['999999', 'Genius', 'Real', 'Physics', '4']

CodePudding user response:

You need to learn about using Python's built in csv library which helps with reading a line of CSV values and converting it into a list.

The approach to this problem is to first read the GPAList values into a dictionary. This allows any ID value to be looked up easily.

The for each row in the student CSV, lookup the required value in the dictionary and append it to the row just read in whilst writing it to the output CSV file.

For example:

import csv

with open('GPAList.csv') as f_gpa:
    csv_gpa = csv.reader(f_gpa)
    gpa = dict(csv_gpa)
    
with open('StudentsMajorsList.csv') as f_students, open('FullRoster.csv', 'w', newline='') as f_roster:
    csv_students = csv.reader(f_students)
    csv_roster = csv.writer(f_roster)
    
    for row in csv_students:
        csv_roster.writerow([*row, gpa[row[0]]])

I suggest you add some print statements to better understand how this works. e.g. print(gpa)

  • Related