Unexpected behavior of Threading.join() - Code between calls to Threading.join()-CodePudding

Does the order of the Threading.join()s in the source code GUARANTEE the order of execution of the threads OR IS IT SIMPLY to ensure that the main thread finishes UNTIL ALL THREADS FINISH?

import threading, time

class myThread(threading.Thread):
    def __init__(self, threadID, name, duerme):
        super(myThread, self).__init__()
        self.threadID = threadID
        self.name = name
        self.duerme = duerme

    def run(self):
        print("run()", self.name)
        time.sleep(self.duerme)
        print("%s termino " % self.name)


thread1 = myThread(1, "Thread-1", 20)
thread2 = myThread(2, "Thread-2", 12)
thread3 = myThread(3, "Thread-3", 6)

thread2.start()
thread1.start()
thread3.start()

thread3.join()
thread1.join()
thread2.join()

print("termino el hilo principal")

output:

run() Thread-2
run() Thread-1
run() Thread-3
Thread-3 termino
Thread-2 termino
Thread-1 termino
termino el hilo principal

If so, Can I see it as a "hook" (telling me that the code below the join()s is sure to be executed AFTER THREADS END ) that simply tells me when threads end?

If I put code between join()s (I change the last part):

thread3.join()
print("debajo del thread3")
thread1.join()
print("debajo del thread1")
thread2.join()
print("debajo del thread2")

print("termino el hilo principal")

I get this:

run() Thread-2
run() Thread-1
run() Thread-3
Thread-3 termino
debajo del thread3
Thread-2 termino
Thread-1 termino
debajo del thread1
debajo del thread2
termino el hilo principal

it is as if the phrase "debajo del thread2" WAS waiting for thread1.

Expected output:

run() Thread-2
run() Thread-1
run() Thread-3
Thread-3 termino
debajo del thread3
Thread-2 termino
debajo del thread2
Thread-1 termino
debajo del thread1
termino el hilo principal

CodePudding user response：

Does the order of the Threading.join()s in the source code GUARANTEE the order of execution of the threads

Which threads are you asking about? If you are asking about order of execution of thread1, thread2, and thread3, then then answer is, absolutely NOT. When you call t.join() it does not do anything at all to thread t. IMO, the best way to think of join is, it does nothing, and it keeps doing nothing until thread t has finished, and then it returns.

OTOH, if you are asking about your main thread, then yes, when your main thread calls threadN.join(), then your main thread will be "blocked" until threadN is finished.

...order of execution of the threads...

Your thread1, thread2, and thread3 are completely unsynchronized. They run concurrently.* "Concurrency" means there are no guarantees about which thread does what, when unless they do something that explicitly synchronizes them.

An example of explicit synchronization would be if one thread releases a mutex that some other thread is waiting for. The thread that is waiting for the mutex is guaranteed not to proceed until the first thread releases it.

Another example of something that synchronizes threads is join. When thread A joins thread B, then thread A is guaranteed not to proceed until after thread B has ended.

* Doing things concurrently is the entire point of using threads. If you don't want concurrency, then don't use threads.

CodePudding user response：

So to answer your questions:

Does the order of the Threading.join()s in the source code GUARANTEE the order of execution of the threads

The answer is No. The order of execution can be somewhat handled by using locks (mutexes, semaphores, ...). They allow you to sychronize certain pieces of your code, so that only one thread have exclusive access to an atomic expression or shared data at a time. But the order is always random, and that's ok because that's how threads should work.

In a parallel environment, you don't really know if thread A executes or finishes first or after thread B.

If you want to ensure that they are in synchrony (A executes first, B executes after A), don't use threads. Use their instructions on the main thread one after another.

OR IS IT SIMPLY to ensure that the main thread finishes UNTIL ALL THREADS FINISH?

Not necessarily. If you don't want to wait for a thread to be finished, that's ok. Python or the OS should handle it's destruction in a proper way (unless an error occurs, which can happen).

Thread.join() should be called if you need to wait (block the current thread) for that single thread to be finished.

In my opinion, it's good practice to join a thread at the end of each script, so you know you are releasing resources once the application finishes executing. But, on many answers found on the internet, even on SO, its stated that it depends on what you want to do. If the thread needs to run even after the application finishes executing, it's resources will be released after the thread finishes its operation.

If so, Can I see it as a "hook" (telling me that the code below the join()s is sure to be executed AFTER THREADS END ) that simply tells me when threads end?

Again. It doesn't tell wether a thread has ended. It blocks the current thread until the targeted thread finishes executing (Thread.run() goes out of scope).

It's a blocking function that returns None as stated here. If you need to know wether the thread is still running, use Thread.is_alive().

Now about your EDIT:

If I put code between join()s (I change the last part):

thread3.join()
print("debajo del thread3")
thread1.join()
print("debajo del thread1")
thread2.join()
print("debajo del thread2")

print("termino el hilo principal")

I get this:

run() Thread-2
run() Thread-1
run() Thread-3
Thread-3 termino
debajo del thread3
Thread-2 termino
Thread-1 termino
debajo del thread1
debajo del thread2
termino el hilo principal

This is how your threads work:

When they start, they print the run string.
They wait for a pre-determinated amount of seconds.
- Thread 1 waits for 20 seconds
- Thread 2 waits for 12 seconds
- Thread 3 waits for 6 seconds
When they finish waiting, they print the termino string.

So, let's run the script! All threads start right now. The starting order of each thread is mostly random. Thread 2 can start first, or maybe Thread 1 can be the last. The order depends on how the OS / Python interpreter schedules the threads execution.

Regardless of which started first, all of them become stuck once they reach the time.sleep() function, waiting for their own timeout.

During this waiting, at the same time, the main thread joins Thread 3. It blocks itself (while other threads are still sleeping), and waits for Thread 3 to awake and go out of scope.

After 6 seconds or more, that happens. Thread 3 goes out of scope, printing the Thread-3 termino string.

Now, main thread becomes responsive again, and goes for its next instruction while Thread 1 and Thread 2 are still sleeping. It joins Thread 1, freezing itself until Thread 1 goes out of scope.

12 seconds pass, and Thread 2 goes out of scope, printing the Thread-2 termino string. 20 seconds pass, and Thread 1 goes out of scope, printing the Thread-1 termino as well. Then, main thread becomes responsive again, and goes for its next instruction: join Thread 2.

It freezes itself waiting for Thread 2 to go out of scope. However this thread is already finished. So the waiting is instant, and the main thread moves to its next instructions until the script finishes executing.

That's what should happen based on the script you wrote. Thread.join() does not control which thread finishes executing first. It controls wether the main thread may procede after a thread has gone out of scope on their Thread.run() method. While Thread.join() has been called, and the current thread is put to sleep, all the other threads will keep executing normally because they are running in parallel. They may even go out of scope before the joined thread awakes the current thread.

CodePudding user response：

Thanks Guys, now I understand the join(), for that I put a time of 80 to thread1 and 100 to thread2 and I could notice what you told me. The messages "print("below thread1")" and "print("below thread2")" were coming out together (one right after the other) because when thread1 ended, thread2 had already ended "THAT'S WHY THE EFFECT OF "WAIT" WAS NOT CLEAR".

Definitely the Threading.join() can be seen as a kind of "Thread.sleep()" the current thread until the t.join() finishes.

Sorry everyone, I'm new to threads but this IS THE HARDEST thing to understand in this world. Now the hard part is to choose the best answer hehe.

So I leave the code in case you want to understand the "wait" of join():

import threading, time


class myThread(threading.Thread):
    def __init__(self, threadID, name, duerme):
        super(myThread, self).__init__()
        self.threadID = threadID
        self.name = name
        self.duerme = duerme

    def run(self):
        print("run()", self.name)
        time.sleep(self.duerme)
        print("%s termino " % self.name)


thread1 = myThread(1, "Thread-1", 80)
#I had to increase the wait of thread 2 so that the wait of the current 
#thread is appreciated (and the "interleaved" messages that SHOW THE 
#WAIT are visible).
thread2 = myThread(2, "Thread-2", 100)

thread3 = myThread(3, "Thread-3", 6)

thread2.start()
thread1.start()
thread3.start()

thread3.join()
print("debajo del thread3")
print("antes de thread1.join()")
thread1.join()
print("debajo del thread1")
thread2.join()
print("debajo del thread2")

print("termino el hilo principal")

Output:

run() Thread-2
run() Thread-1
run() Thread-3
Thread-3 termino
debajo del thread3
antes de thread1.join()
Thread-1 termino 
debajo del thread1 //This was the hard to see effect
Thread-2 termino 
debajo del thread2
termino el hilo principal