I have this code and I want only paths that end to a file without intermediate empty folders. For example:
data/folder1/folder2
data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt
From those paths I only want:
data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt
I am using this code but it gives me paths that end to directories as well:
subfolders = set()
current_path = None
result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
objects = result.get("Contents")
try:
for obj in objects:
current_path = os.path.dirname(obj["Key"])
if current_path not in subfolders:
subfolders.add(current_path)
except Exception as exc:
print(f"Getting objects with prefix: {prefix} failed")
raise exc
CodePudding user response:
Cant you check whether there is an extension or not? By the way, you dont need to check existence of the path in the set since set will always keep the unique items.
list_objects
does not return any indicator whether the item is folder or file. So, this looks the practical way.
Please check: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects
subfolders = set()
current_path = None
result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
objects = result.get("Contents")
try:
for obj in objects:
current_path = os.path.dirname(obj["Key"])
if "." in current_path:
subfolders.add(current_path)
except Exception as exc:
print(f"Getting objects with prefix: {prefix} failed")
raise exc
CodePudding user response:
I would recommend using the boto3 Bucket resource here, because it simplifies pagination.
Here is an example of how to get a list of all files in an S3 bucket:
import boto3
bucket = boto3.resource("s3").Bucket("mybucket")
objects = bucket.objects.all()
files = [obj.key for obj in objects if not obj.key.endswith("/")]
print("Files:", files)
It's worth noting that getting a list of all folders and subfolders in an S3 bucket is a more difficult problem to solve, mainly because folders don't typically exist in S3. They are logically present, but not physically present, because of the presence of objects with a given hierarchical key such as dogs/small/corgi.png
. For ideas, see retrieving subfolder names in S3 bucket.