Home > other >  How to get a list of all folders that list in a specific s3 location using spark in databricks?
How to get a list of all folders that list in a specific s3 location using spark in databricks?

Time:07-25

Currently, I am using this code but it gives me all folders plus sub-folders/files for a specified s3 location. I want only the names of the folder live in s3://production/product/:

def get_dir_content(ls_path):
  dir_paths = dbutils.fs.ls(ls_path)
  subdir_paths = [get_dir_content(p.path) for p in dir_paths if p.isDir() and p.path != ls_path]
  flat_subdir_paths = [p for subdir in subdir_paths for p in subdir]
  return list(map(lambda p: p.path, dir_paths))   flat_subdir_paths
    

paths = get_dir_content('s3://production/product/')
[print(p) for p in paths]

Current output returns all folders plus sub-directories where files live which is too much. I only need the folders that live on that hierachical level of the specifiec s3 location (no deeper levels). How do I teak this code?

CodePudding user response:

just use dbutils.fs.ls(ls_path)

  • Related