I am trying to list s3 obejcts like this:
for key in s3_client.list_objects(Bucket='bucketname')['Contents']:
logger.debug(key['Key'])
I just want to print the folder names or file names that are present on the first layer.
For example, if my bucket has this:
bucketname
folder1
folder2
text1.txt
text2.txt
catallog.json
I only want to print folder1
, folder2
and catalog.json
. I don't want to include text1.txt etc.
However, my current solution also prints the files names present within the folders in my bucketname.
How can I modify this? I saw that there's a 'Prefix' parameter but not sure how to use it.
CodePudding user response:
You can split the keys on "/" and only keep the first level:
level1 = set() #Using a set removes duplicates automatically
for key in s3_client.list_objects(Bucket='bucketname')['Contents']:
level1.add(key["Key"].split("/")[0]) #Here we only keep the first level of the key
#then print your level1 set
logger.debug(level1)
/!\ Warnings
list_object
method has been revised and it is recommended to uselist_objects_v2
according to AWS S3 documentation- this method only returns some or all (up to 1,000) keys. If you want to make sure you get all the keys, you need to use the
continuation_token
returned by the function:
level1 = set()
continuation_token = ""
while continuation_token is not None:
extra_params = {"ContinuationToken": continuation_token} if continuation_token else {}
response = s3_client.list_objects_v2(Bucket="bucketname", Prefix="", **extra_params)
continuation_token = response.get("NextContinuationToken")
for obj in response.get("Contents", []):
level1.add(obj.get("Key").split("/")[0])
logger.debug(level1)
CodePudding user response:
You use the Delimiter
option, for example:
import boto3
s3 = boto3.client("s3")
BUCKET = "bucketname"
rsp = s3.list_objects_v2(Bucket=BUCKET, Delimiter="/")
objects = [obj["Key"] for obj in rsp["Contents"]]
folders = [fld["Prefix"] for fld in rsp["CommonPrefixes"]]
for obj in objects:
print("Object:", obj)
for folder in folders:
print("Folder:", folder)
Result:
Object: catalog.json
Folder: folder1/
Folder: folder2/
Note that if you have a large number of keys at your top-level (over 1000) then you will need to paginate your requests.
Also, note that list_objects is essentially deprecated and you should use list_objects_v2.