I have a current process that reads in a data source directory via a yaml file designation:
with open (r'<yaml file>') as file:
directory = yaml.load(file, Loader = yaml.FullLoader)
source_directory = directory['source_directory']
The yaml file reads as follows:
source_directory : '<directory>'
However, the data source has now shifted from a local directory to an S3 Bucket. I am able to view the files in my S3 bucket by using the code below:
import boto3
def ListFiles(client):
response = client.list_objects(Bucket = '<bucket name>')
for content in response.get('Contents', []):
yield content.get('Key')
file_list = ListFiles(client)
for file in file_list:
print(file)
The boto3 code correctly lists my files, so I know my connection to the bucket is successful. How do I reference the directory path within the S3 bucket in the variable source_directory
in the yaml file?
Update based on a comment I got:
Someone suggested to use s3://<bucket_name>/object_path
in place of the call out in the yaml file. However, this produces a No such file or directory
error.
CodePudding user response:
If you are wanting to obtain a list of objects in a given directory of an Amazon S3 bucket, you can use:
response = client.list_objects_v2(Bucket='<bucket name>',Prefix=source_directory)
The Prefix
should end with a slash, so the YAML file should look like:
source_directory : 'my-directory/'