I am trying to replicate the command aws s3 ls s3://bucket/prefix/
using boto3. Currently, I am able to grab all the objects within the path using
s3 = boto3.client('s3')
bucket = "my-bucket"
prefix = "my-prefix"
paginator = s3.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=bucket, Prefix = prefix)
Then, I can iterate through page_iterator and manually reconstruct the top-level directories within that path. However, since there are a ton of objects inside the path, retrieving all the objects to reconstruct the results of this command takes roughly 30 seconds for me, whereas the AWS CLI command is pretty much instant. Is there a more efficient way to do this?
CodePudding user response:
You should use the Delimiter
option of list_objects_v2
to group any objects with a common prefix together. This is basically what aws s3 ls
does without the --recursive
switch:
import boto3
s3 = boto3.client('s3')
bucket = "my-bucket"
prefix = "my-prefix"
paginator = s3.get_paginator('list_objects_v2')
# List all objects, group objects with a common prefix
for page in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter="/"):
# CommonPrefixes and Contents might not be included in a page if there
# are no items, so use .get() to return an empty list in that case
for cur in page.get("CommonPrefixes", []):
print("<PRE> " cur["Prefix"])
for cur in page.get("Contents", []):
print(cur["Size"], cur["Key"])