I have 2 lists recorded in files (File1.txt and File2.txt) and I need to compare line by line and record the amount of equal occurrences in the Output.txt file.
However, the result written to the output is incorrect. See below the code I used and the result obtained and the desired:
input_file1 = open('C:\\Temp\\File1.txt', 'r')
input_file2 = open('C:\\Temp\\File2.txt', 'r')
output_file = open('C:\\Temp\\Output.txt','w')
match = 0
strOutput = ""
for line1 in input_file1:
LST1 = list(line1)
input_file2.seek(0)
output_file.write('\n')
for line2 in input_file2:
LST2 = list(line2)
match = len(set(LST1).intersection(set(LST2)))
strOutput = str(match) ',' line2
output_file.write("%s" %(strOutput))
output_file.close()
input_file2.close()
input_file1.close()
input_file1:
01,04,07,23,39
03,05,08,37,45
02,03,10,13,28
input_file2:
01,02,03,21,22,23,27
03,05,10,13,37,39,47
output (INCORRECT!):
7,01,02,03,21,22,23,27
7,03,05,10,13,37,39,47
5,01,02,03,21,22,23,27
6,03,05,10,13,37,39,47
5,01,02,03,21,22,23,27
4,03,05,10,13,37,39,47
output (CORRECT):
2,01,02,03,21,22,23,27
1,03,05,10,13,37,39,47
1,01,02,03,21,22,23,27
3,03,05,10,13,37,39,47
2,01,02,03,21,22,23,27
3,03,05,10,13,37,39,47
OR:
output (CORRECT):
2,1,2,3,21,22,23,27
1,3,5,10,13,37,39,47
1,1,2,3,21,22,23,27
3,3,5,10,13,37,39,47
2,1,2,3,21,22,23,27
3,3,5,10,13,37,39,47
CodePudding user response:
Here is the fixed code:
input_file1 = open('File1.txt', 'r')
input_file2 = open('File2.txt', 'r')
output_file = open('Output.txt','w')
match = 0
strOutput = ""
for line1 in input_file1:
LST1 = line1.strip().split(',')
input_file2.seek(0)
output_file.write('\n')
for line2 in input_file2:
LST2 = line2.strip().split(',')
match = len(set(LST1).intersection(set(LST2)))
strOutput = str(match) ',' line2
output_file.write("%s" %(strOutput))
output_file.close()
input_file2.close()
input_file1.close()
There are two main issues:
1- The original line: LST1 = list(line1)
wasn't generating the correct list, ending with a list of: ['0' , '1' , ',' , '0' , ....]
instead of ['01','02',...]
2- In your originaly document, you have a new line character at the end of every line, thus your line1 looks like this: '01,04,07,23,39\n'
To solve this, we remove the last 2 character.
With both of those changes, you end up with this line:
LST1 = line1.strip().split(',')
Where .strip()
removes the new line character, and .split(',')
splits it correctly.
Running that code gave me this output:
2,01,02,03,21,22,23,27
1,03,05,10,13,37,39,47
1,01,02,03,21,22,23,27
3,03,05,10,13,37,39,47
2,01,02,03,21,22,23,27
3,03,05,10,13,37,39,47