Home > database >  Scala most efficient way to process files in folder based on a file list
Scala most efficient way to process files in folder based on a file list

Time:09-25

I am trying to find the most efficient way to process files in multiple folders based on a list of allowed files.

I have a list of allowed files that I should process.

The proces is as follows

  1. val allowedFiles = List("File1.json","File2.json","File3.json")
  2. Get list of folders in directory. For this I could use:
      def getListOfSubDirectories(dir: File): List[String] =
            dir.listFiles
               .filter(_.isDirectory)
               .map(_.getName)
               .toList
  1. Loop through each folder from step 2. and get all files. For this I would use :
    def getListOfFiles(dir: String):List[File] = {
        val d = new File(dir)
        if (d.exists && d.isDirectory) {
            d.listFiles.filter(_.isFile).toList
        } else {
            List[File]()
        }
    }
  1. If file from step 3. are in list of allowed files call another method that process the file

So I need to loop through a first directory, get files, check if file need to be procssed and then call another functionn. I was thinking about double loop which would work but is the most efficient way. I know in scala I should be using resursive funstions but failed with this double recursive function with call to extra method.

Any ides welcome.

CodePudding user response:

Files.find() will do both the depth search and filter.

import java.nio.file.{Files,Paths,Path}
import scala.jdk.StreamConverters._

def getListOfFiles(dir: String, targets:Set[String]): List[Path] =
  Files.find( Paths.get(dir)
            , 999
            , (p, _) => targets(p.getFileName.toString)
            ).toScala(List)

usage:

val lof = getListOfFiles("/DataDir",  allowedFiles.toSet)

But, depending on what kind of processing is required, instead of returning a List you might just process each file as it is encountered.

import java.nio.file.{Files,Paths,Path}

def processFile(path: Path): Unit = ???
  
def processSelected(dir: String, targets:Set[String]): Unit =
  Files.find( Paths.get(dir)
            , 999
            , (p, _) => targets(p.getFileName.toString)
            ).forEach(processFile)

CodePudding user response:

You can use Files.walk
The code would look like this (I didn't compile it, so it may have some typos)

import java.nio.file.{Files, Path}
import scala.jdk.StreamConverters._

def getFilesRecursive(initialFolder: Path, allowedFiles: Set[String]): List[Path] =
  Files
    .walk(initialFolder)
    .filter(path => allowedFiles.contains(path.getFileName.toString.toLowerCase))
    .toScala(List)
  • Related