Home > database >  Compare and print different lines of files with the same name from two different directories
Compare and print different lines of files with the same name from two different directories

Time:02-02

I am writing a script to compare and print files in two different directories.

But I also want, to compare the contents of files with the same names and print any different lines. For example:

cd1
txt1.txt
txt2.txt

cd2
txt1.txt
txt2.txt
txt3.txt

cd1\txt1.txt contains:

line1
line2
line3
line4
line6

while cd2\txt1.txt contains:

line1
line2
line3
line5

I want to be able to print something like:

"Lines in cd1\txt1.txt different than cd2\txt1.txt are: line4 and line6"
"Lines in cd2\txt1.txt different than cd1\txt1.txt are: line5"

My code so far:

########## COMPARE DIRECTORIES ####################

import sys, os

files1 = []
files2 = []

#path1 = input("Please enter the first directory: ")
#path2 = input("Please enter the second directory: ")

path1 = r"C:\Users\Desktop\WBM_V3_events_translated" 
path2 = r"C:\Users\Desktop\WBM_V3_events_blocked"

for path, subdirs, files in os.walk(path1):
    for name in files:
        files1.append(name)
        
for path, subdirs, files in os.walk(path2):
    for name in files:
        files2.append(name)
        
print("                          ")
print("Printing sent FOR_TRANSLATION dir files. Total:", len(files1))
print("                          ")
for name in files1:
    print(os.path.join(path1, name))

print("                          ")
print("                          ")
print("Printing returned TRANSLATED dir files. Total:", len(files2))
print("                          ")
for name in files2:
    print(os.path.join(path2, name))
   
   
distinct_files = []

for file in files2:
    if file not in files1:
            distinct_files.append(file)
            
print("                          ")
print("Printing the files in FOR_TRANSLATION dir, NOT IN the returned TRANSLATED fir. Total:", len(distinct_files))
print("                          ")
for name in distinct_files:
    print(os.path.join(path2, name))   
    
########## COMPARING FILE CONTENTS ####################

lines1 = []
lines2 = []

x = 0
y = 0

lines_3 = []
lines_4 = []

for name in files1:
    for line in name:
        lines1.append(line)
        x  = 1
       
for name in files2:
    for line in name:
        lines2.append(line)
        y  = 1

for line in lines1:
        if line not in lines2:
            if not line.isspace():
                lines_3.append(line)
                
for line in lines2:
        if line not in lines1:
            if not line.isspace():
                lines_4.append(line)

print("                          ")
print("Lines in AAAAAAAAA. Total:", x)
print("                          ")
print(lines_3)

print("                          ")
print("Lines in ZZZZZZZZZ. Total:", y)
print("                          ")
print(lines_4)

The bottom last print statements result in:

Lines in AAAAAAAAA. Total: 103

[]

Lines in ZZZZZZZZZ. Total: 180

['C', 'E', 'r']

CodePudding user response:

I think you should start by creating a method that takes two strings of text and compares them line by line. At the moment, things are getting a little muddled.

For example files1 is a list of file names:

files1 = []
for path, subdirs, files in os.walk(path1):
    for name in files:
        files1.append(name)

but you seem to want to then use files1 as if it was more like a list of file contents:

for name in files1:
    for line in name:
        lines1.append(line)
        x  = 1

In the prior code block, line is a character in the current name, not the contents of the file. You want to look to open() the file by name and read it

for name in files1:
    with open(name, "rt") as file_in:
        for line in file_in:
            lines1.append(line)
            x  = 1
  • Related