This function copies files from one folder to another acording to filetype. The problem is when the number of files is so big that it takes too long to copy. Maybe there is another way of doing it? Using another library/language/syntax?
def main_copy(source, destination):
# List of all files inside directory
files_fullpath = [f for f in listdir(source)
if isfile(join(source, f))]
# Copy files to the correct folder according to filetype
if len(files_fullpath) != 0:
for fs in files_fullpath:
full_file = source "\\" fs
if str(fs).endswith('.ARW'):
shutil.copy(full_file, raw_folder "\\" fs)
else:
shutil.copy(full_file, jpg_folder "\\" fs)
if len(listdir(destination)) != 0:
print("Files moved succesfully!")
CodePudding user response:
You can try to use threading
or multiprocessing
to speed up copying. I would advise to use concurrent.futures
module with either ThreadPoolExecutor
or ProcessPoolExecutor
.
CodePudding user response:
A good idea to speed up your procedure would be to parallelize the computation up to the maximum disk write speed.
You could use the multiprocessing library and decide whether to use multithreading (multiprocessing.pool.ThreadPool) pools or multiprocessing (multiprocessing.Pool).
I'll show you an example using multithreading, but multiprocessing might also be a good choice (depends on many factors).
import os
import glob
import shutil
from functools import partial
from multiprocessing.pool import ThreadPool
def multi_copy(source, destination, n=8):
# copy_to_mydir will copy any file you give it to destination dir
copy_to_mydir = partial(shutil.copy, dst=destination)
# list of files we want to copy
to_copy = glob.glob(os.path.join(source, '*'))
with ThreadPool(n) as p:
p.map(copy_to_mydir, to_copy)
If you want to go deeper into multiprocessing vs multithreading, you can read this beautiful post.
Note: The bottleneck of this problem should be the writes to disk. It depends on the machine on which you are running the code, probably on some machines you will not get any benefit since one process can saturate the write-to-disk capacity. In many cases it may be advantageous to write to disk in parallel instead.