Home > Software engineering >  Parallel threading python GIL vs Java
Parallel threading python GIL vs Java

Time:04-27

I know that python has a GIL that make threads not be able to run at the same time therefore threading is just context switching.

Why is java different? Threads on the same CPU in every language cannot run parallel.

  1. Is creating new thread in java utilizes cores in multi core machine?

  2. python can only spawn threads on the same CPU, in contrast to java?

  3. If 1. Is the case, when using more threads than CPUs even in java it comes back to context switching again for several of them?

  4. If 1. Is the case then how is it differ from multiprocessing? Because utilizing multiple cores isn't guaranteed?

  5. Isn't the whole point of threading is being able to use the same memory space? If java does run some of them in multiple threads for perallelism, how do they really share memory?

Thank you

CodePudding user response:

Why is java different?

Because it is able to effectively use multiple cores at the same time.

  1. Does creating a new thread in java utilizes cores in multi core machine?

Yes.

  1. Python can only spawn threads on the same CPU, in contrast to Java?

Java can spawn multiple threads which will on different CPUs. Java is not responsible for the actual thread scheduling. That is handled by the OS. And the OS may reschedule a thread to a different CPU to the one that it started on.

I am not sure about the precise details for Python, but I think the GIL is an implementation detail rather than something that it intrinsic to the language itself1. But either way, in a Python implementation, the GIL means that you would get little performance benefit in spawning threads on multiple cores.

  1. If 1. is the case, when using more threads than CPUs does it come back to context switching in Java?

It depends. When switching a CPU between threads belonging to different processes, a full context switch is involved. But when switching between threads in the same process, only the (user) registers need to be switched. (The virtual memory registers and caches don't need to be switched / flushed because the threads share the same virtual address space.)

  1. If 1. is the case then how is it differ from multiprocessing? Because utilizing multiple cores isn't guaranteed?

The key difference between multi-threading and multi-processing is that processes do not share any memory. By contrast, one thread in a process can see the memory of all of the others ... modulo issues of when changes are visible.

This difference has a variety of consequences.

  1. Isn't the whole point of threading is being able to use the same memory space?

Yes, that is the main point ... when you compare multi-threading with multi-processing.

If Java does run some of them in multiple threads for parallelism ...

Java supports threads for many reasons. Parallelism is only one of those reasons. Others include multiplexing I/O and simplifying certain kinds of programming problem. These other reasons are also relevant to Python.

... how do [Java threads] really share memory?

The hardware deals with the issues of making the physical memory visible to all of the threads, and propagation of changes via the memory caches. It is complicated.


In Java the onus is on the programmer to "do the right thing" when threads make use of shared variables / objects. You need to use volatile variables, or synchronized blocks / methods, or something else that ensures that there is a happens before chain between a write and subsequent read. (Otherwise you can get issues with changes not being visible.)

This transfer of responsibility to the programmer allows the compiler to generate code with fewer main memory operations ... and hence that is faster. The downside is that if an application doesn't obey the rules, it is liable to behave in unexpected ways.


1 - While the GIL is not formally part of the Python spec, the influence of GIL on the (unspecified!) Python memory model and Python programmers assumptions make it more than merely an implementation detail. It remains to be seen if Python can successfully evolve into a language where multi-threading can use multiple cores effectively.

CodePudding user response:

Not a complete answer here, but just adding a couple of things that Stephen C didn't already say:

  1. Python can only spawn threads on the same CPU, in contrast to java?

That would be an optimization, not an essential fact. There's no reason in principle why Python could not simply allow the OS to schedule its threads on whatever CPU happened to be available at any given time.

OTOH, given that no two Python threads can do significant work at the same time, it potentially could improve performance if the threads all had affinity for the same CPU. (See what Stephen C said about "full context switch" vs. "only the (user) registers."

Giving user-mode processes control over processor affinity is a relatively new feature in some operating systems. I have no idea of whether or not any Python version actually uses that feature.

  1. If java does run...multiple threads for parallelism...?

Java doesn't "run multiple threads for parallelism." Your Java program creates multiple threads for whatever reason you happen to want them. Most modern OSs provide threads. Java simply makes that ability available to application programmers in a way that is tightly integrated with the language itself. You are free to use them (or not) however you see fit.

  • Related