Home > Software design >  S3 Paths | boto3
S3 Paths | boto3

Time:10-07

I have a current process that reads in a data source directory via a yaml file designation:

with open (r'<yaml file>') as file:
    directory = yaml.load(file, Loader = yaml.FullLoader)

source_directory = directory['source_directory']

The yaml file reads as follows:

source_directory : '<directory>'

However, the data source has now shifted from a local directory to an S3 Bucket. I am able to view the files in my S3 bucket by using the code below:

import boto3

def ListFiles(client):
    response = client.list_objects(Bucket = '<bucket name>')
    for content in response.get('Contents', []):
        yield content.get('Key')


file_list = ListFiles(client)
for file in file_list:
    print(file)

The boto3 code correctly lists my files, so I know my connection to the bucket is successful. How do I reference the directory path within the S3 bucket in the variable source_directory in the yaml file?


Update based on a comment I got:

Someone suggested to use s3://<bucket_name>/object_path in place of the call out in the yaml file. However, this produces a No such file or directory error.

CodePudding user response:

If you are wanting to obtain a list of objects in a given directory of an Amazon S3 bucket, you can use:

response = client.list_objects_v2(Bucket='<bucket name>',Prefix=source_directory)

The Prefix should end with a slash, so the YAML file should look like:

source_directory : 'my-directory/'
  • Related