Home > Blockchain >  How to use wildcard in hdfs file path while list out files in nested folder
How to use wildcard in hdfs file path while list out files in nested folder

Time:11-03

I'm using below code to list out files in nested folder val hdfspath="/data/retail/apps/*/landing"

import org.apache.hadoop.fs. (FileSystem, Path) val fs= org.apache.hadoop.fs.FileSystem.get (spark.sparkContext.hadoopConfiguration) fs.listStatus (new Path (s"S(hdfspath)")).filter(.isDirectory).map(_.getPath).foreach (printin)

If I use path as below hdfspath="/data/retail/apps" getting results but if I use val hdfspath="/data/retail/apps/*/landing" then I'm getting error it's showing path not exist error.plese help me out.

enter image description here Error Image

CodePudding user response:

according to this answer, you need to use globStauts instead of listStatus:

fs.globStatus(new Path (s"S(hdfspath)")).filter(.isDirectory).map(_.getPath).foreach (println)
  • Related