I was investigating how Project Loom works and what kind of benefits it can bring to my company.
So I understand the motivation, for standard servlet based backend, there is always a thread pool that executes a business logic, once thread is blocked because of IO it can't do anything but wait. So let's say I have a backend application that has single endpoint , the business logic behind this endpoint is to read some data using JDBC which internally uses InputStream which again will use blocking system call( read() in terms of Linux). So if I have 200 hundred users reaching this endpoint, I need to create 200 threads each waiting for IO.
Now let's say I switched a thread pool to use virtual threads instead. According to Ben Evans in the article Going inside Java’s Project Loom and virtual threads:
Instead, virtual threads automatically give up (or yield) their carrier thread when a blocking call (such as I/O) is made.
So as far as I understand, if I have amount of OS threads equals to amount of CPU cores and unbounded amount of virtual threads, all OS threads will still wait for IO and Executor service won't be able to assign new work for Virtual threads because there are no available threads to execute it. How is it different from regular threads , at least for OS threads I can scale it to thousand to increase the throughput. Or Did I just misunderstood the use case for Loom ? Thanks in advance
Addon
I just read this mailing list:
Virtual threads love blocking I/O. If the thread needs to block in say a Socket read then this releases the underlying kernel thread to do other work
I am not sure I understand it, there is no way for OS to release the thread if it does a blocking call such as read, for these purposes kernel has non blocking syscalls such as epoll which doesn't block the thread and immediately returns a list of file descriptors that have some data available. Does the quote above implies that under the hood , JVM will replace a blocking read
with non blocking epoll
if thread that called it is virtual ?
CodePudding user response:
Your first excerpt is missing the important point:
Instead, virtual threads automatically give up (or yield) their carrier thread when a blocking call (such as I/O) is made. This is handled by the library and runtime [...]
The implication is this: if your code makes a blocking call into the library (for example NIO) the library detects that you call it from a virtual thread and will turn the blocking call into a non-blocking call, park the virtual thread and continue processing some other virtual threads code.
Only if no virtual thread is ready to execute will a native thread be parked.
Note that your code never calls a blocking syscall, it calls into the java libraries (that currently execute the blocking syscall). Project Loom replaces the layers between your code and the blocking syscall and can therefore do anything it wants - as long as the result for your calling code looks the same.
CodePudding user response:
The Answer by Thomas Kläger is correct. I’ll add a few thoughts.
So as far as I understand, if I have amount of OS threads equals to amount of CPU cores and unbounded amount of virtual threads, all OS threads will still wait for IO
No, incorrect, you misunderstand.
What you describe is what happens under current threading technology in Java. With a one-to-one mapping of Java thread to host OS thread, any call made in Java that blocks (waiting a relatively long time for a response) leaves that host thread twiddling its thumb, doing no work. This would not be a problem if the host had a zillion threads so that other threads could be scheduled for work on a CPU core. But host OS threads are quite expensive, so we do not have a zillion, we have very few.
Using Project Loom technology, the JVM detects the blocking call, such as waiting for I/O. Once detected, the JVM sets aside (“parks”) the virtual thread as it waits for I/O response. The JVM assigns a different virtual thread to that host OS carrier thread, so that “real” thread may continue performing work rather than waiting while twiddling its thumbs. Since the virtual threads living within the JVM are so cheap (highly efficient with both memory and CPU), we can have thousands, even millions, for the JVM to juggle.
In your example of 200 threads each waiting for IO response form JDBC calls to a database, if those were virtual threads that would all be parked within the JVM. The few host OS threads used as carrier threads by your ExecutorService
will be working on other virtual breads that are not currently blocked. This parking and rescheduling of blocked-then-unblocked virtual threads is handled automatically by the Project Loom technology within the JVM, with no intervention needed by us Java app developers.
let's say I switched a thread pool to use virtual threads
Actually, there is no pool of virtual threads. Each virtual thread is fresh and new, with no recycling. This eliminates worrying about thread-local contamination.
ExecutorService executorService = Executors.newVirtualThreadPerTaskExecutor() ;
…
executorService.submit( someTask ) ; // Every task submitted gets assigned to a fresh new virtual thread.
To learn more, I highly recommend viewing the videos of presentations and interviews by Ron Pressler or Alan Bateman, members of the Project Loom team. Find the most recent, as Loom has been evolving.
And read the new Java JEP, JEP draft: Virtual Threads (Preview).
CodePudding user response:
I finally found an answer . So as I said , by default InputStream.read
method makes a read()
syscall which according to Linux man pages will block the underling OS thread. So how is it possible that Loom won't block it ? I found an article that shows the stacktrace So if this block of code will be executed by virtual thread
URLData getURL(URL url) throws IOException {
try (InputStream in = url.openStream()) {//blocking call
return new URLData(url, in.readAllBytes());
}
}
JVM runtime will transform it into the following stacktrace
java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:60)//this line parks the virtual thread
java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:184)
java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:212)
java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:356)//JVM runtime will replace an actual read() into read from java nio package
java.base/java.io.InputStream.readAllBytes(InputStream.java:346)
How JVM knows when to unpark the virtual thread ? Here is the stacktrace that will be ran once readAllBytes
is finished
"Read-Poller" #16
java.base@17-internal/sun.nio.ch.KQueue.poll(Native Method)
java.base@17-internal/sun.nio.ch.KQueuePoller.poll(KQueuePoller.java:65)
java.base@17-internal/sun.nio.ch.Poller.poll(Poller.java:195)
The author of the article uses MacOs, Mac uses kqueue
as non blocking syscall, If I run it on Linux, I would see epoll
syscall.
So basically Loom doesn't introduce anything new, under the hood it's a plain epoll
syscall with callbacks which can be implelemented using a framework such as Vert.x that uses Netty under the hood, but in Loom the callback logic is encapsulated with the JVM runtime which I found counter intuitive, when I call InputStream.read() I do expect a corresponding read() syscall, but JVM will replace it with non blocking syscalls.