Home > Enterprise >  How can I build a List of Java.IO File objects in a thread-safe and/or stateless way?
How can I build a List of Java.IO File objects in a thread-safe and/or stateless way?

Time:08-13

I am writing a test Spring Boot app and I want to build it to be thread-safe from ground-up. For this question, let's assume app is a simple REST API which returns a list of file and directory names from local OS filesystem where app resides based on specified path (provided by user as GET parameter when invoking the REST API).

I appreciate horizontal scaling could be achieved using containers/kubernates, event/queue based architectures and other such approaches - however I am not interested in using those methods at present (unless you folks suggest this is the only elegant solution to my question). Therefore, please assume platform is JVM running on a single multicore (linux) OS instance/server.

@RestController
public class myController {

    FileService fileService;

    /**RestController method to return JSON formatted list of file & directory 
     *names contained within a given path when 
     *http://[server:port]/rest/browse?path=[UserSpecifiedPath] 
     is requested by client**/

    @GetMapping("/rest/browse")
    public List<Java.IO.File> browseFiles(@RequestParam(value="path",required=false) String pathName) {

        return fileService.list(pathName);

    }
}

@Service
public class FileService {

    //Service method to return alphabetically sorted list of non-hidden files & directories
    public List<Java.IO.File> list(String pathName) {

        return Arrays.asList(Arrays.stream(new File(pathName).listFiles())
                .parallel()
                .filter(file -> !file.getName()
                        .startsWith("."))
                .sorted(Comparator.comparing(File::getName))
                .toArray(File[]::new));
    }
}

The code to actually return the sorted list of files & dirs is quite dense and leans on Java's Arrays Collection, as well as a lambda function. I am unfamiliar with the underlying code of the Arrays collection (and how to reason about its functionality) as well as the way the lambda function will interact with it. I am keen to limit the use of synchronized/locking to resolve this issue, as I wish FileService() to be as parallelizable as possible.

    My concern is related to FileService:
  • I have instantiated FileService as a singleton (thanks to Spring Boot's default behaviour)
  • Spring's Controller/servlet is multithreaded insofar as each request has at least one thread
  • FileService's use of the Arrays Collection code, together with the lambda function does on a new IO.File object to populate a List not appear to me to be atomic
  • Therefore, multiple threads representing multiple requests could be executing different portions of fileService at once, creating unpredictable results
  • Even if Spring Boot framework somehow handles this particular issue behind the scenes, if I want to add some hitherto unwritten additional concurrency to the controller or other part of app in future, I will still have a fileService.list that is not thread safe and my app will therefore produce unpredictable results due to multiple threads messing with the instantiated File object in fileService.list()

The above represents my best attempt at reasoning about why my code has problems and is possibly stateful. I appreciate there are gaps in my knowledge (clearly, I could do a deep dive into Arrays Collection and lambda function) and I likely do not fully understand the concept of state itself to an extent and getting my self twisted-up over nothing. I have always found state to be a bit confusing given even supposedly stateless languages must store state somewhere (in memory, an application has to store its variables at some point, as they are passed between operations).

Is my reasoning above correct? How can I write FileService to be stateless?

CodePudding user response:

I think that your reasoning is flawed. I actually can't see any reason why this is not thread-safe.

  1. I have instantiated FileService as a singleton (thanks to Spring Boot's default behavior)

That doesn't actually impact on thread-safety in this case.

  1. Spring's Controller/Servlet is multi-threaded insofar as each request has at least one thread.

Well, yes. There could be multiple threads calling FileService.list(...) at the same time, so thread-safety is relevant.

  1. FileService's use of the Arrays Collection code, together with the lambda function does on a new IO.File object to populate a List not appear to me to be atomic.

The fact that it is not atomic (at a certain level) is not actually relevant.

  1. Therefore, multiple threads representing multiple requests could be executing different portions of FileService at once, creating unpredictable results.

Well ... no. In fact requests do not interact with the state of FileService at all. Each request has its own state separate from all other simultaneous (or otherwise) requests.

The facts that are relevant to thread-safety as follows:

  • The list method is not referring to any fields of the FileService class, or any other of your domain classes. Therefore it doesn't share state via instance or static fields.

  • The new File(pathName).listFiles() fragment creates a thread confined File object and a thread confined array. We can assume that this is thread-safe. (The javadocs don't mention it, but there is no reason for File to interact with other threads.)

Terminology: a "thread confined" object is an object that only one thread can ever access. It is "confined" to that thread.

  • The Arrays.stream(...)...collect(...) is using a thread confined Stream to create an array.

  • The Stream uses parallel() which potentially involves using other (worker) threads. However, the stream pipeline downstream is just acting on File objects in the stream, so there are no thread-safety concerns there.

  • Finally, the Arrays.asList(...) is wrapping the array as a thread confined List.

The key thing here is none of this code is going to share state with another thread.

Since there is no sharing of state between threads, there is no possibility that one thread's actions can affect another other one's behavior. The thread-confined objects don't need to be thread-safe, because we know that know other threads can access them.

Therefore, this code is thread-safe.

And the FileService class is stateless too. (There is no state associated with an instance of FileService.)


There is a caveat to the above. If some other thread was modifying the directory that you are listing with listFiles() at the same time as a list call was happening, it is not clear what files it would see. (It is potentially operating specific.)

If it was a hard requirement of your "file service" that list should provide a view that represents a point-in-time snapshot, then this code doesn't do that. But that is a fundamental property of File.listFiles, and I don't think there is a way to solve that.


Finally, I recommend that you read a good book or take a good course on Java concurrent program. You need to understand the basics before you can reason about them.

CodePudding user response:

Therefore, multiple threads representing multiple requests could be executing different portions of fileService at once,

Correct.

creating unpredictable results

No. Each thread has its own method invocation, which has its own local variables, which reference their own objects. Since each thread uses its own objects, threads don't interact at all, and can not possibly interfere with each other's work, making this code trivially thread safe.

Put differently, thread safety issues only arise when several threads use the same object. If they use different objects (or the objects are immutable), the code is trivially thread safe.

As Spring abstracts away the creation of each new controller object and its own multithreading, I struggled to understand correctly what was going on with respect to threads and method invocation in the controller

By default, a controller is application scoped, so the same controller object will be shared by all threads.

To put it another way, for each chain of logic emanating out of a Spring Controller, I can basically not concern myself with how per-request threads are behaving.

Not quite. If these threads modify shared objects, you are responsible for synchronizing access. It's just that usually, you will not share any mutable objects, trivially fullfilling this requirement. In your case, the Controller object is shared, but not mutable, while the Lists, Streams, Arrays, and File objects are not shared, so there aren't any mutable shared objects.

However, if you were to optimize the performance of your FileService with an in-memory cache such as

@Service
class FileService {
    Map<String, File[]> cache = new HashMap<>(); // danger

    public List<File> list(String pathName) {
        var result = cache.get(pathName);
        if (result == null) {
            result = ...;
            cache.put(pathName, result);
        }
        return result;
    }
}

all threads would share the same controller object, and the same FileService, and therefore the same Map, requiring you to synchronize access to the Map somehow.

  • Related