I'm trying to compare 2 csv files and then put the common entries in a 3rd csv to write to file. For some reason it iterates the whole loop for row in csv_input but the entry in csv_compare loop iterates only once and stops on the last entry. I want to compare every row entry with every entry entry.
import csv
finalCSV = {}
with open('input.csv', newline='') as csvfile, open('compare.csv', newline='') as keyCSVFile, open('output.csv', 'w' ,newline='') as OutputCSV:
csv_input = csv.reader(csvfile)
csv_compare = csv.reader(keyCSVFile)
csv_output = csv.writer(OutputCSV)
csv_output.writerow(next(csv_input))
for row in csv_input:
for entry in csv_compare:
print(row[0] ' ' entry[0])
if row[0] == entry[0]:
csv_output.writerow(row)
break
print('wait...')
CodePudding user response:
When you break the inner loop and start the next iteration of the outer loop, csv_compare
doesn't reset to the beginning. It picks up where you left off. Once you have exhausted the iterator, that's it.
You would need to reset the iterator at the top of each iteration of the outer loop, which is most easily done by simply opening the file there.
with open('input.csv', newline='') as csvfile, open('output.csv', 'w' ,newline='') as OutputCSV:
csv_input = csv.reader(csvfile)
csv_output = csv.writer(OutputCSV)
csv_output.writerow(next(csv_input))
for row in csv_input:
with open('compare.csv', newline='') as keyCSVFile:
csv_compare = csv.reader(keyCSVFile)
for entry in csv_compare:
if row[0] == entry[0]:
csv_output.writerow(row)
break
CodePudding user response:
I suggest to read the first column from csv_compare
to list or a set and then use only single for-loop:
import csv
finalCSV = {}
with open("input.csv", newline="") as csvfile, open(
"compare.csv", newline=""
) as keyCSVFile, open("output.csv", "w", newline="") as OutputCSV:
csv_input = csv.reader(csvfile)
csv_compare = csv.reader(keyCSVFile)
csv_output = csv.writer(OutputCSV)
csv_output.writerow(next(csv_input))
compare = {entry[0] for entry in csv_compare} # <--- read csv_compare to a set
for row in csv_input:
if row[0] in compare: # <--- use `in` operator
csv_output.writerow(row)
CodePudding user response:
You could skip the inner loop completely. You add rows from input.csv
when the first column matches any of the first column values in compare.csv
. So put those values in a set for easy lookup.
import csv
with open('compare.csv', newline='') as keyCSVFile:
key_set = {row[0] for row in csv.reader(keyCSVFile)}
with open('input.csv', newline='') as csvfile, open('output.csv', 'w' ,newline='') as OutputCSV:
csv_input = csv.reader(csvfile)
csv_output = csv.writer(OutputCSV)
csv_output.writerow(next(csv_input))
csv_output.writerows(row for row in csv_input if row[0] in key_set)
del key_set
print('wait...')