I have two csv files that I need to merge based on the first column (which is column 0). I cannot give them headers, and I cannot use pandas. Here are the two files
StudentsMajorsList.csv
305671,Jones,Bob,Electrical Engineering,
987621,Wong,Chen,Computer Science,
323232,Rubio,Marco,Computer Information Systems,
564321,Awful,Student,Computer Science,Y
769889,Boy,Sili,Computer Information Systems,Y
156421,McGill,Tom,Electrical Engineering,
999999,Genius,Real,Physics,
GPAList.csv
156421,3.4
305671,3.1
323232,3.8
564321,2.2
769889,3.9
987621,3.85
999999,4
I want the resulting csv file to appear like this
FullRoster.csv
305671,Jones,Bob,Electrical Engineering,3.1
987621,Wong,Chen,Computer Science,3.85
323232,Rubio,Marco,Computer Information Systems,3.8
564321,Awful,Student,Computer Science,Y,2.2
769889,Boy,Sili,Computer Information Systems,Y,3.9
156421,McGill,Tom,Electrical Engineering,3.4
999999,Genius,Real,Physics,4
What code can I use in order to achieve this. Please remember that pandas is not allowed, and I cannot give the files headers to make things easier. I have to use them exactly as they are. Thank you!
EDIT: I APOLOGIZE I did not include the code I have. I don't use this cite often and I didn't familiarize myself with the rules before posting. My apologies! Here's what I have so far, but this code does not work:
with open('StudentsMajorsList.csv','r') as f2:
reader = csv.reader(f2)
dict2 = {row[0]: row[1:] for row in reader}
with open('GPAList.csv','r') as f1:
reader = csv.reader(f1)
dict1 = OrderedDict((row[0], row[1:]) for row in reader)
result = OrderedDict()
for d in (dict1, dict2):
for key, value in dict.items():
result.setdefault(key, []).extend(value)
with open('FullRoster.csv', 'w') as f:
w = csv.writer(f)
for key, value in result.items():
w.writerow([key] value)
CodePudding user response:
# usage: merge_csv.py <file1> <file2> <output>
# example: merge_csv.py file1.csv file2.csv file3.csv
import csv
import sys
def merge_csv(file1, file2, output):
with open(file1, 'r') as f1, open(file2, 'r') as f2, open(output, 'w') as f3:
reader1 = csv.reader(f1)
reader2 = csv.reader(f2)
writer = csv.writer(f3)
for row1, row2 in zip(reader1, reader2):
if row1[0] == row2[0]:
writer.writerow(row1 row2[1:])
elif row1[0] < row2[0]:
writer.writerow(row1)
else:
writer.writerow(row2)
if __name__ == '__main__':
merge_csv(sys.argv[1], sys.argv[2], sys.argv[3])
print('done')
CodePudding user response:
This is how I would of done it:
import csv
with open('StudentsMajorsList.csv', newline='') as file:
reader = csv.reader(file)
data1 = list(reader)
with open('GPAList.csv', newline='') as file:
reader = csv.reader(file)
data2 = list(reader)
merge1 = []
merge2 = []
merge3 = []
for list1 in data1:
for item in list1:
x = item.split(',')
merge1.append(x)
for list2 in data2:
for item in list2:
x = item.split(',')
merge2.append(x)
for i in range(len(merge1)):
for j in range(len(merge2)):
if(merge1[i][0] == merge2[j][0]):
merge3.append(merge1[i][0:])
merge3[i].append(merge2[j][1])
for item in merge3:
for i in item:
if (i == ''):
item.remove(i)
for item in range(len(merge3)):
print(merge3[item])
with open('FullRoster.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerows(merge3)
Output:
['305671', 'Jones', 'Bob', 'Electrical Engineering', '3.1']
['987621', 'Wong', 'Chen', 'Computer Science', '3.85']
['323232', 'Rubio', 'Marco', 'Computer Information Systems', '3.8']
['564321', 'Awful', 'Student', 'Computer Science', 'Y', '2.2']
['769889', 'Boy', 'Sili', 'Computer Information Systems', 'Y', '3.9']
['156421', 'McGill', 'Tom', 'Electrical Engineering', '3.4']
['999999', 'Genius', 'Real', 'Physics', '4']
CodePudding user response:
You need to learn about using Python's built in csv
library which helps with reading a line of CSV values and converting it into a list.
The approach to this problem is to first read the GPAList
values into a dictionary. This allows any ID value to be looked up easily.
The for each row in the student CSV, lookup the required value in the dictionary and append it to the row just read in whilst writing it to the output CSV file.
For example:
import csv
with open('GPAList.csv') as f_gpa:
csv_gpa = csv.reader(f_gpa)
gpa = dict(csv_gpa)
with open('StudentsMajorsList.csv') as f_students, open('FullRoster.csv', 'w', newline='') as f_roster:
csv_students = csv.reader(f_students)
csv_roster = csv.writer(f_roster)
for row in csv_students:
csv_roster.writerow([*row, gpa[row[0]]])
I suggest you add some print statements to better understand how this works. e.g. print(gpa)