I am writing a simple text comparison tool. It takes two text files - a template and a target - and compares each character in each line using two for-loops. Any differences are highlighted with a Unicode full block symbol (\u2588). In the case that the target line is longer than the template, I am using itertools.zip_longest to fill the non-existant characters with a fill value.
from itertools import zip_longest
def compare(filename1, filename2):
file1 = open(filename1, "r")
file2 = open(filename2, "r")
for line1, line2 in zip_longest(file1, file2):
for char1, char2 in zip_longest(line1, line2, fillvalue=None):
if char1 == char2:
print(char2, end='')
elif char1 == None:
print('\u2588', end='')
compare('template.txt', 'target.txt')
Template file: Target file:
First line First lineXX
Second line Second line
Third line Third line
However, this appears to mess with Python's automatic line break placement. When a line ends with such a fill value, a line break is not generated, giving this result:
First line██Second line
Third line
Instead of:
First line██
Second line
Third line
The issue persisted after rewriting the script to use .append and .join (not shown to keep it short), though it allowed me to highlight the issue:
Result when both files are identical:
['F', 'i', 'r', 's', 't', ' ', 'l', 'i', 'n', 'e', '\n']
First line
['S', 'e', 'c', 'o', 'n', 'd', ' ', 'l', 'i', 'n', 'e', '\n']
Second line
['T', 'h', 'i', 'r', 'd', ' ', 'l', 'i', 'n', 'e']
Third line
Result when first line of target file has two more characters:
['F', 'i', 'r', 's', 't', ' ', 'l', 'i', 'n', 'e', '█', '█']
First line██['S', 'e', 'c', 'o', 'n', 'd', ' ', 'l', 'i', 'n', 'e', '\n']
Second line
['T', 'h', 'i', 'r', 'd', ' ', 'l', 'i', 'n', 'e']
Third line
As you can see, Python automatically adds a line break \n if the lines are of identical length, but as soon as zip_longest is involved, the last character in the list is the block, not a line break. Why does this happen?
CodePudding user response:
Strip your lines before comparing characters and print new line between each line:
from itertools import zip_longest
def compare(filename1, filename2):
file1 = open(filename1, "r")
file2 = open(filename2, "r")
for line1, line2 in zip_longest(file1, file2):
line1, line2 = line1.strip(), line2.strip() # <- HERE
for char1, char2 in zip_longest(line1, line2, fillvalue=None):
if char1 == char2:
print(char2, end='')
elif char1 == None:
print('\u2588', end='')
print() # <- HERE
compare('template.txt', 'target.txt')