Home > OS >  Fastest way to move objects within an S3 bucket using boto3
Fastest way to move objects within an S3 bucket using boto3

Time:10-15

I need to copy all files from one prefix in S3 to another prefix within the same bucket. My solution is something like:

file_list = [List of files in first prefix]
for file in file_list:
            copy_source = {'Bucket': my_bucket, 'Key': file}
            s3_client.copy(copy_source, my_bucket, new_prefix)

However I am only moving 200 tiny files (1 kb each) and this procedure takes up to 30 seconds. It must be possible to do it fasteer?

CodePudding user response:

I would do it in parallel. For example:

from multiprocessing import Pool

file_list = [List of files in first prefix]
    
print(objects_to_download)

def s3_coppier(s3_file):
     copy_source = {'Bucket': my_bucket, 'Key': s3_file}
     s3_client.copy(copy_source, my_bucket, new_prefix)

# copy 5 objects at the same time
with Pool(5) as p:
    p.map(s3_coppier, file_list)

CodePudding user response:

So you have a function you need to call on a bunch of things, all of which are independent of each other. You could try multiprocessing.

from multiprocessing import Process

def copy_file(file_name, my_bucket):
    copy_source = {'Bucket': my_bucket, 'Key': file_name}
    s3_client.copy(copy_source, my_bucket, new_prefix)

def main():
    file_list = [...]

    for file_name in file_list:
        p = Process(target=copy_file, args=[file_name, my_bucket])
        p.start()

Then they all can start at (approximately) the same time, instead of having to wait for the last file to complete.

  • Related