Home > Back-end >  Is there a more efficient way to make this function?
Is there a more efficient way to make this function?

Time:08-25

This function copies files from one folder to another acording to filetype. The problem is when the number of files is so big that it takes too long to copy. Maybe there is another way of doing it? Using another library/language/syntax?

def main_copy(source, destination):

    # List of all files inside directory
    files_fullpath = [f for f in listdir(source) 
                        if isfile(join(source, f))] 

    # Copy files to the correct folder according to filetype
    if len(files_fullpath) != 0:
        for fs in files_fullpath:
            full_file = source   "\\"   fs
            if str(fs).endswith('.ARW'):
                shutil.copy(full_file, raw_folder   "\\"   fs)
            else:
                shutil.copy(full_file, jpg_folder   "\\"   fs)
        if len(listdir(destination)) != 0:
            print("Files moved succesfully!")

CodePudding user response:

You can try to use threading or multiprocessing to speed up copying. I would advise to use concurrent.futures module with either ThreadPoolExecutor or ProcessPoolExecutor.

CodePudding user response:

A good idea to speed up your procedure would be to parallelize the computation up to the maximum disk write speed.

You could use the multiprocessing library and decide whether to use multithreading (multiprocessing.pool.ThreadPool) pools or multiprocessing (multiprocessing.Pool).

I'll show you an example using multithreading, but multiprocessing might also be a good choice (depends on many factors).

import os
import glob
import shutil
from functools import partial
from multiprocessing.pool import ThreadPool

def multi_copy(source, destination, n=8):
    # copy_to_mydir will copy any file you give it to destination dir
    copy_to_mydir = partial(shutil.copy, dst=destination)

    # list of files we want to copy
    to_copy = glob.glob(os.path.join(source, '*'))

    with ThreadPool(n) as p:
        p.map(copy_to_mydir, to_copy)

If you want to go deeper into multiprocessing vs multithreading, you can read this beautiful post.

Note: The bottleneck of this problem should be the writes to disk. It depends on the machine on which you are running the code, probably on some machines you will not get any benefit since one process can saturate the write-to-disk capacity. In many cases it may be advantageous to write to disk in parallel instead.

  • Related