Home > Software design >  How to get ALL subdirectories, all levels deep except files in AWS S3 with python boto3
How to get ALL subdirectories, all levels deep except files in AWS S3 with python boto3

Time:10-16

There are a lot of similar questions but I don't find an answer to exactly on this question. How to get ALL sub-directories starting from an initial one. The depth of the sub-directories is unknown.

Lets say I have:

data/subdir1/subdir2/file.csv
data/subdir1/subdir3/subdir4/subdir5/file2.csv
data/subdir6/subdir7/subdir8/file3.csv

So I would like to either get a list of all sub-directories all length deep OR even better all the paths one level before the files. In my example I would ideally want to get:

data/subdir1/subdir2/
data/subdir1/subdir3/subdir4/subdir5/
data/subdir6/subdir7/subdir8/

but I could work with this as well:

data/subdir1/
data/subdir1/subdir2/
data/subdir1/subdir3/
data/subdir1/subdir3/subdir4/
etc...
data/subdir6/subdir7/subdir8/

My code so far only gets me one level deep of directories:

result = await self.s3_client.list_objects(
    Bucket=bucket, Prefix=prefix, Delimiter="/"
)

subfolders = set()
for content in result.get("CommonPrefixes"):
    print(f"sub folder : {content.get('Prefix')}")
    subfolders.add(content.get("Prefix"))

return subfolders

CodePudding user response:

import os

# list_objects returns a dictionary. The 'Contents' key contains a
# list of full paths including the file name stored in the bucket
# for example: data/subdir1/subdir3/subdir4/subdir5/file2.csv
objects = s3_client.list_objects(Bucket='bucket_name')['Contents']

# here we iterate over the fullpaths and using 
# os.path.dirname we get the fullpath excluding the filename
for obj in objects:
    print(os.path.dirname(obj['Key'])
  • Related