What is the purpose of using Join with threading-CodePudding

When using threads, you are looking for fast execution of multiple instances at the same time, doesn't .join() defeat that purpose by waiting for each thread to finish before starting, wouldn't that be in essence the exact same as a regular loop. When not using join the threads fire as quickly as they are initiated. My question may sound naive as I'm still trying to learn.

Let's say the itemsArr has 1000 items, the itemQueryRequest takes 3 seconds to execute, you want each item to be queried as close as possible to the same time for all so you use threading.

Also, the thread will die regardless of join once the target function completes, so ya .. what am I missing.

#lightning fast
import threading
for item in itemsArr:
    t = Thread(target=itemQueryRequest, args=(item,))
    t.start()

# SLOW
th = []
for item in itemsArr:
    t = Thread(target=itemQueryRequest, args=(item,))
    th.append(t)

th.start()
th.join() // < SLOW

CodePudding user response：

You're right, if you call join() immediately after starting a thread, it defeats the purpose of having a thread, since now you've got a child-thread running but your main-thread is blocked until the child thread returns and therefore you still don't have any parallelism.

However, join() was not intended to be used that way. Instead, it's expected that you'll start() one or more threads, and then the main thread will either continue on doing (whatever it usually does) or alternatively it will then call join() on each of the launched threads in order to block until all of the threads have exited. In either of those two cases, you have still achieved effective parallelism (Python GIL notwithstanding).

The real purpose of join(), however, is to allow you free up resources safely. For one thing, there are some underlying resources associated with each thread (such as its return-value) that need to be retained in memory until join() (or detach()) is called, in case the parent thread wants to use them; more importantly, if the parent thread has allocated some resource that the child thread has access to, then it's generally not safe for the parent thread to free that resource until after the child thread has exited, since destroying it while the child thread is in the middle of using it would cause big problems for the child-thread.

Similarly, if the child thread is working on preparing some data for the parent thread to use, it's not safe for the parent thread to try to use that data until after the child thread has finished preparing it -- there's no point in trying to use half-constructed data.

Given that, it's common for the parent thread to call join() to wait until the child thread has exited before doing any cleanup work that would affect the child thread.

If the child thread isn't designed to automatically exit in a finite period of time, the main thread might request the child-thread to exit before it makes the join() call, e.g. by setting a boolean variable or writing a byte on a pipe, or etc, and the child thread would react to that by exiting, so that the join() call wouldn't block indefinitely.