Hello i'm cleaning up my computer, so i found myself feeding a huge list of files to Handbrake for compressing them. After the compression, some files have a size which is bigger than the original. I want to clean up that, so i tried to make a small python script.
Basically i have 2 folders with files having same name but different size, i want to compare the files to delete the bigger one, so if i merge the folders i'll have saved only the smaller files in size.
I make an example of the folders i have
- test/Original
file1.mpg 40Mb
file2.mpg 2Mb
file3.mpg 400Mb
file4.mpg 45Mb
- test/Compressed
file1.mpg 20Mb
file2.mpg 2Mb
file3.mpg 200Mb
file4.mpg 105Mb
At the end of the script i'd like to have this (or a third folder with those merged)
- test/Original
file4.mpg 45Mb
- test/Compressed
file1.mpg 20Mb
file2.mpg 2Mb
file3.mpg 200Mb
file4.mpg 105Mb
i wrote this code and it seems to work, but i'd like to know if there's a better way of doing this, i heard of a function filecompare but i don't understand if i can get the filesize from it.
plus i dont understand why if i remove the comment to the line commented, i get an indent error.
import os
dirA = 'test/a'
dirB = 'test/b'
merged = []
with os.scandir(dirA) as it:
for entry in it:
if entry.is_file():
merged.append(entry)
with os.scandir(dirB) as it:
for entry in it:
if entry.is_file():
merged.append(entry)
for i in range(len(merged)):
# print('-------------iterating over %s' % (merged[i].name,merged[i].stat().st_size/1024**2))
for j in range(i 1, len(merged)):
if str(merged[i].name) == str(merged[j].name):
print('----DUPLICATE %s %.2f Mb = %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2, merged[j].name, merged[j].stat().st_size/1024**2))
if merged[i].stat().st_size >= merged[j].stat().st_size:
print('removing %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2))
os.remove(merged[i])
elif merged[i].stat().st_size < merged[j].stat().st_size:
print('removing %s %.2f Mb' % (merged[j].name, merged[j].stat().st_size/1024**2))
os.remove(merged[j])
CodePudding user response:
Deleting files based on size
This is a simple procedure and can be implemented in one funciton.
def compare_folders(path1, path2):
ignore = [".", "..", ".DS_Store"] # ignore these pointers/ files
for file in os.listdir(path1):
if file in os.listdir(path2):
if file not in ignore:
delete_larger_file(path1 "/" file, path2 "/" file)
def merge_folders(path1, path2):
for file in os.listdir(path1):
if file not in os.listdir(path2):
os.rename(path1 "/" file, path2 "/" file)
def delete_larger_file(path1, path2):
if os.path.getsize(path1) > os.path.getsize(path2):
os.remove(path1)
else:
os.remove(path2)
What's going on here?
- The first function
compare_folders()
will take the paths to the folders being compared as inputs. It will then iterate through the contents of each folder and call the other functiondelete_larger_file()
which compares the sizes of 2 files and deletes the larger one. - A subsequent call to
merge_folders()
is necessary to merge the folders in place. In other words, it will compare the contents of both folders and move the files that are not in one to the other. In the end, one folder should be empty and the other one should have all the smallest files. - Be warned: this cannot be undone so maybe test it first? Also if there are subfolders this will not work and will require recursion.
First call compare_folders()
then call merge_folders
CodePudding user response:
i post the complete code, in case anyone would need it, thanks to @drow339 for it!!!!
import os
path1 = 'test1/a'
path2 = 'test1/b'
print('Comparing the folders: %s and %s' % (path1, path2))
def compare_folders(path1, path2):
ignore = [".", "..", ".DS_Store"] # ignore these pointers/ files
for file in os.listdir(path1):
print('Checking duplicates for %s' % file)
if file in os.listdir(path2):
print('ok')
if file not in ignore:
print('Duplicate found: %s <---------' % file)
delete_larger_file(path1 "/" file, path2 "/" file)
def delete_larger_file(path1, path2):
if os.path.getsize(path1) >= os.path.getsize(path2):
print('Duplicates: %s - %s - deleting the first one' % (path1, path2))
os.remove(path1)
else:
print('Duplicates: %s - %s - deleting the second one' % (path1, path2))
os.remove(path2)
def merge_folders(path1, path2):
for file in os.listdir(path1):
if file not in os.listdir(path2):
os.rename(path1 "/" file, path2 "/" file)
compare_folders(path1, path2)
merge_folders(path1, path2)