For a project I am currently working on with Scala and Spark, I have to make a code that checks if the hdfs directory I am working on is empty, and if it is not, I have to remove every files from the directory.
Before I deploy my code into Azur, I am testing it with a local directory from my computer.
I am starting with: making a method to delete every files from this directory. This is what I have for now :
object DirectoryCleaner {
val spark:SparkSession = SparkSession.builder()
.master("local[3]")
.appName("SparkByExamples.com")
.getOrCreate()
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir\\file1.csv")
def deleFilesDir(): Unit = {
if(fs.exists(srcPath) && fs.isFile(srcPath))
fs.delete(srcPath, true)
}
}
With this code, I am able to delete a single file (file1.csv
). I would like to be able to define my path this way val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir")
(without specifying any filename), and just delete every files from the test_dir
directory. Any idea on how I could do that ?
Thank's for helping
CodePudding user response:
Use fs.listFiles
to get all the files in a directory and then loop through them while deleting them. Also, set the recursive
flag to false
, so you don't recurse into directories.
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
def deleteAllFiles(directoryPath: String, fs: FileSystem): Unit = {
val path = new Path(directoryPath)
// get all files in directory
val files = fs.listFiles(path, false)
// print and delete all files
while (files.hasNext) {
val file = files.next()
fs.delete(file.getPath, false)
}
}
// Example for local, non HDFS path
val directoryPath = "file:///Users/m_vemuri/project"
val fs = FileSystem.get(new Configuration())
deleteAllFiles(directoryPath, fs)