I want to download file(s) from prefix folder and not its sub-directories inside prefix folder. I am running below but it list all file(s) inside prefix folder including sub-directories. Is there I can list only file(s) from prefix folder only ?
bucket = aws_resource_session.Bucket(bucket_name)
for obj in bucket.objects.filter(Prefix=s3_end_point_properties["prefix"]):
print(obj.key)
Thanks, Imran Khan
CodePudding user response:
There is no "folder" in s3. Imagine everything in a single "folder" in the root. File Names are long and have "/" in them, that's how all the objects are stored. AWS is just being helpful as people are usually familiar with folder structure.
When you specify a prefix, you will get every object that starts with that. You can use python's standard library for string manipulation to eke out what you don't want before the download.
You know that each object will have your prefix, so you can devise a way to strip prefix from each key first, then check to see if you still have '/' in the file name (i.e. s3 key). If there is, then that is "inside" the folder or a subfolder (I am using the terms you used to make it clear), which you dont want.
E.g.:
objects = bucket.objects.filter(Prefix=prefix)
[obj for obj in objects if '/' not in obj.key.replace(prefix,'')]
CodePudding user response:
Rather than use the higher-level Resource interface Bucket, which will simply give you a list of all objects within the bucket, you can use the lower-level Client interface. Specifically, if you include the Delimiter parameter when calling list_objects_v2 then the results will return the objects at the given prefix in "Contents" and the 'sub-folders' in "CommonPrefixes".
Example:
import boto3
s3 = boto3.client("s3")
rsp = s3.list_objects_v2(Bucket="mybucket", Prefix="myprefix/", Delimiter="/")
print("Objects:", list(obj["Key"] for obj in rsp["Contents"]))
print("Sub-folders:", list(obj["Prefix"] for obj in rsp["CommonPrefixes"]))
Sample output with Prefix="csv/"
:
Objects: ['csv/a.csv', 'csv/b.csv', 'csv/c.csv']
Sub-folders: ['csv/corrupt/', 'csv/complete/']
If you do not include the Delimiter
parameter, then all objects at this prefix and below will be present in the "Contents"
, for example:
Objects: ['csv/a.csv', 'csv/b.csv', 'csv/c.csv', 'csv/corrupt/d.csv', 'csv/complete/e.csv']