S3 boto3 list directories only-CodePudding

I have below hierarchy in S3 and would like to retrieve only subfolders type information excluding files that ends in .txt (basically exclude filenames and retrieve only prefixes/folders).

--folder1/subfolder1/item1.txt
--folder1/subfolder1/item11.txt
--folder1/subfolder2/item2.txt
--folder1/subfolder2/item21.txt
--folder1/subfolder3/item3.txt
--folder1/subfolder3/subfolder31/item311.txt

Desired Output:

--folder1/subfolder1
--folder1/subfolder2
--folder1/subfolder3/subfolder31

I understand that there is no folders/subfolders in S3 but all are keys.

I tried below code but it is displaying all information including filenames like item1.txt

s3 = boto3.resource('s3')
client = boto3.client('s3')
bucket = s3.Bucket('s3-bucketname')
paginator = client.get_paginator('list_objects')


objs = list(bucket.objects.filter(Prefix='folder1/'))
for i in range(0, len(objs)):
    print(objs[i].key)

Any recommendation to get below output?

--folder1/subfolder1
--folder1/subfolder2
--folder1/subfolder3/subfolder31

CodePudding user response：

As you say, S3 doesn't really have a concept of folders, so to get what you want, in a sense, you need to recreate it.

One option is to list all of the objects in the bucket, and construct the folder, or prefix, of each object, and operate on new names as you run across them:

import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('s3-bucketname')
shown = set()
for obj in bucket.objects.filter(Prefix='folder1/'):
    prefix = "/".join(obj.key.split("/")[:-1])
    if len(prefix) and prefix not in shown:
        shown.add(prefix)
        print(prefix   "/")