Currently, I am using this code but it gives me all folders plus sub-folders/files for a specified s3 location. I want only the names of the folder live in s3://production/product/:
def get_dir_content(ls_path):
dir_paths = dbutils.fs.ls(ls_path)
subdir_paths = [get_dir_content(p.path) for p in dir_paths if p.isDir() and p.path != ls_path]
flat_subdir_paths = [p for subdir in subdir_paths for p in subdir]
return list(map(lambda p: p.path, dir_paths)) flat_subdir_paths
paths = get_dir_content('s3://production/product/')
[print(p) for p in paths]
Current output returns all folders plus sub-directories where files live which is too much. I only need the folders that live on that hierachical level of the specifiec s3 location (no deeper levels). How do I teak this code?
CodePudding user response:
just use dbutils.fs.ls(ls_path)