Home > Back-end >  Use Python to download s3 objects to a arbitrarily defined local directory
Use Python to download s3 objects to a arbitrarily defined local directory

Time:12-27

I have a Python function (see below) which iteratively downloads objects from a remote s3 directory and stores them in a local folder.

In current state, the files go here.

AnalysisOutput/file1
AnalysisOutput/file2
AnalysisOutput/file3

AnalysisOutput is the name of the remote bucket. I don't want that directory to be hard-coded on my local instance. Instead, I want them to go here:

tempS3output/file1
tempS3output/file2
tempS3output/file3
def downloadDirectoryFroms3(bucketName,remoteDirectoryName):
    s3_resource = boto3.resource('s3')
    bucket = s3_resource.Bucket(bucketName)
    number = 0
    for object in bucket.objects.filter(Prefix = remoteDirectoryName):
        number = number   1
        if not os.path.exists(os.path.dirname(object.key)):
            os.makedirs(os.path.dirname(object.key))
        bucket.download_file(object.key,object.key)

downloadDirectoryFroms3('reciter-dynamodb', 'AnalysisOutput')

CodePudding user response:

You could use removeprefix from the standard lib to change the directory where it is downloaded. Your function would become:

def download_directory_from_s3(bucket_name, remote_directory_name, local_directory_name):
    s3_resource = boto3.resource('s3')
    bucket = s3_resource.Bucket(bucket_name)
    number = 0
    for object in bucket.objects.filter(Prefix=remote_directory_name):
        number = number   1
        if not os.path.exists(os.path.dirname(object.key)):
            os.makedirs(os.path.dirname(object.key))
        local_path = f"{local_directory_name}/{object.key.removeprefix(remote_directory_name)}"
        bucket.download_file(object.key, object.key)

download_directory_from_s3('reciter-dynamodb', 'AnalysisOutput')

Assuming object.key is a string, if not convert it to string with something like str(object.key) for instance.

CodePudding user response:

The comment from @Methacrylon was very helpful. I made a couple tweaks to make this work for me.

download_directory_from_s3('reciter-dynamodb', 'AnalysisOutput','temp')

def download_directory_from_s3(bucket_name, remote_directory_name, local_directory_name):
    s3_resource = boto3.resource('s3')
    bucket = s3_resource.Bucket(bucket_name)
    number = 0
    for object in bucket.objects.filter(Prefix=remote_directory_name):
        number = number   1
        print(object)        
        local_path = f"{local_directory_name}/"
        file_name = local_path   "/"   object.key.removeprefix(remote_directory_name)
        bucket.download_file(object.key, file_name)    
  • Related