Assume that I have two folders with 1000 text files in them, for example, folder 1 and folder 2.
Those two folders have text files with the same names, for example:
folder 1: ab.txt, bc.txt, cd.txt, ac.txt, etc.
folder 2: ab.txt, bc.txt, cd.txt, ac.txt, etc.
Each text file contain bunch of numbers. Here is an example of the text inside the text file, for example, ab.txt from folder 1 has:
5 0.796 0.440 0.407 0.399
24 0.973 0.185 0.052 0.070
3 0.91 0.11 0.12 0.1
and ab.txt from folder 2 has :
1 0.8 0.45 0.407 0.499
24 0.973 0.185 0.052 0.070
5 5.91 6.2 2.22 0.2
I want to read the text files inside of those two folders and compare the first column of the each pair of text files that has the same name (indicated above). For example, if the first columns of the two text files have different numbers, I want to move those from folder_1 to another folder called "output". Here is what I wrote. I can compare two text files. However, I wonder how do I compare similar text files located in two different folders?
import difflib
with open(r'path to txt file\') as folder_1:
file_1_text = file_1.readlines()
with open(r'r'path to txt file\'') as folder_2:
file_2_text = file_2.readlines()
# Find and print the diff:
for line in difflib.unified_diff(
file_1_text, file_2_text, fromfile='file1.txt',
tofile='file2.txt', lineterm=''):
print(line)```
CodePudding user response:
You can create a list of all files in a folder with os.listdir()
.
folder1_files = os.listdir(folder_path1)
folder2_files = os.listdir(folder_path2)
Than you can iterate over both lists and check if the file names are equal.
for file1 in folder1_files:
for file2 in folder2_files:
if file1 == file2:
...
Comparing the first line is also not that difficult. Read the lines of both files and check if they are different.
file1_path = os.path.join(folder_path1, file1)
file2_path = os.path.join(folder_path2, file2)
file1_file = open(file1_path, 'r')
file2_file = open(file2_path, 'r')
file1_lines = file1_file.readlines()
file2_lines = file2_file.readlines()
if file1_lines[0] != file2_lines[0]:
...
I would either use shutil.move
or shutil.copy
to move/copy the files.
shutil.copy(file1_path, "output/" file1)
Closing the file descriptors
Note
The Term "file descriptor" might Not be 100% accurate in this context because open()
creates a file object not a file descriptor. The basis of a file object is a file descriptor so file.close()
is closing the file descriptor but I still think you can't say it like that. Read more here: what is the difference between os.open and os.fdopen in python
file1_file.close()
file2_file.close()
All together in a function:
def compare_files(folder_path1, folder_path2):
import os
import shutil
folder1_files = os.listdir(folder_path1)
folder2_files = os.listdir(folder_path2)
for file1 in folder1_files:
for file2 in folder2_files:
if file1 == file2:
file1_path = os.path.join(folder_path1, file1)
file2_path = os.path.join(folder_path2, file2)
file1_file = open(file1_path, 'r')
file2_file = open(file2_path, 'r')
file1_lines = file1_file.readlines()
file2_lines = file2_file.readlines()
output_path = "output"
if not os.path.exists(output_path):
os.makedirs(output_path)
if file1_lines[0] != file2_lines[0]:
shutil.copy(file1_path, output_path "/" file1)
file1_file.close()
file2_file.close()
compare_files("folder1", "folder2")
if you want to compare the numbers and e.g. 1
should be the same as 1.0
you can do the following.
l1 = file1_lines[0].split()
l2 = file2_lines[0].split()
for i in range(len(l1 if len(l1) < len(l2) else l2)):
if float(l1[i]) != float(l2[i]):
output_path = "output"
if not os.path.exists(output_path):
os.makedirs(output_path)
shutil.copy(file1_path, output_path)
break