Home > Enterprise >  compare files with same name in 2 folders and check their size to delete the bigger one in Python
compare files with same name in 2 folders and check their size to delete the bigger one in Python

Time:03-07

Hello i'm cleaning up my computer, so i found myself feeding a huge list of files to Handbrake for compressing them. After the compression, some files have a size which is bigger than the original. I want to clean up that, so i tried to make a small python script.

Basically i have 2 folders with files having same name but different size, i want to compare the files to delete the bigger one, so if i merge the folders i'll have saved only the smaller files in size.

I make an example of the folders i have

- test/Original
 file1.mpg 40Mb
 file2.mpg 2Mb
 file3.mpg 400Mb
 file4.mpg 45Mb

- test/Compressed
 file1.mpg 20Mb
 file2.mpg 2Mb
 file3.mpg 200Mb
 file4.mpg 105Mb

At the end of the script i'd like to have this (or a third folder with those merged)

- test/Original
 file4.mpg 45Mb

- test/Compressed
 file1.mpg 20Mb
 file2.mpg 2Mb
 file3.mpg 200Mb
 file4.mpg 105Mb

i wrote this code and it seems to work, but i'd like to know if there's a better way of doing this, i heard of a function filecompare but i don't understand if i can get the filesize from it.

plus i dont understand why if i remove the comment to the line commented, i get an indent error.

import os

dirA = 'test/a'
dirB = 'test/b'
merged = []

with os.scandir(dirA) as it:
    for entry in it:
        if entry.is_file():
            merged.append(entry)

with os.scandir(dirB) as it:
    for entry in it:
        if entry.is_file():
            merged.append(entry)


for i in range(len(merged)):
#   print('-------------iterating over %s' % (merged[i].name,merged[i].stat().st_size/1024**2))
    for j in range(i   1, len(merged)):
        if str(merged[i].name) == str(merged[j].name):
            print('----DUPLICATE %s %.2f Mb = %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2, merged[j].name, merged[j].stat().st_size/1024**2))
            if merged[i].stat().st_size >= merged[j].stat().st_size:
                print('removing %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2))
                os.remove(merged[i])
            elif merged[i].stat().st_size < merged[j].stat().st_size:
                print('removing %s %.2f Mb' % (merged[j].name, merged[j].stat().st_size/1024**2))
                os.remove(merged[j])

CodePudding user response:

Deleting files based on size

This is a simple procedure and can be implemented in one funciton.

def  compare_folders(path1, path2):
    ignore = [".", "..", ".DS_Store"] # ignore these pointers/ files
    for  file  in  os.listdir(path1):
        if  file  in  os.listdir(path2):
            if  file  not  in  ignore:
                delete_larger_file(path1   "/"   file, path2   "/"   file)

  
def  merge_folders(path1, path2):
    for  file  in  os.listdir(path1):
        if  file  not  in  os.listdir(path2):
            os.rename(path1   "/"   file, path2   "/"   file)

def  delete_larger_file(path1, path2):
    if  os.path.getsize(path1) > os.path.getsize(path2):
        os.remove(path1)
    else:
        os.remove(path2)

What's going on here?

  • The first function compare_folders() will take the paths to the folders being compared as inputs. It will then iterate through the contents of each folder and call the other function delete_larger_file() which compares the sizes of 2 files and deletes the larger one.
  • A subsequent call to merge_folders() is necessary to merge the folders in place. In other words, it will compare the contents of both folders and move the files that are not in one to the other. In the end, one folder should be empty and the other one should have all the smallest files.
  • Be warned: this cannot be undone so maybe test it first? Also if there are subfolders this will not work and will require recursion.

First call compare_folders() then call merge_folders

CodePudding user response:

i post the complete code, in case anyone would need it, thanks to @drow339 for it!!!!

import os

path1 = 'test1/a'
path2 = 'test1/b'

print('Comparing the folders: %s and %s' % (path1, path2))


def compare_folders(path1, path2):
    ignore = [".", "..", ".DS_Store"]  # ignore these pointers/ files
    for file in os.listdir(path1):
        print('Checking duplicates for %s' % file)
        if file in os.listdir(path2):
            print('ok')
            if file not in ignore:
                print('Duplicate found: %s <---------' % file)
                delete_larger_file(path1   "/"   file, path2   "/"   file)


def delete_larger_file(path1, path2):
    if os.path.getsize(path1) >= os.path.getsize(path2):
        print('Duplicates: %s - %s - deleting the first one' % (path1, path2))
        os.remove(path1)
    else:
        print('Duplicates: %s - %s - deleting the second one' % (path1, path2))
        os.remove(path2)


def merge_folders(path1, path2):
    for file in os.listdir(path1):
        if file not in os.listdir(path2):
            os.rename(path1   "/"   file, path2   "/"   file)


compare_folders(path1, path2)
merge_folders(path1, path2)
  • Related