Home > database >  Python understanding threading and race condition
Python understanding threading and race condition

Time:06-09

Trying to clarify my understanding of threading.

To my understanding, when the GIL isn't released manually (e.g. time.sleep), the OS assigns it to threads randomly right? If so, how come no race condition ever occurs here? I've re-run the code many times and the ending value logged (last line) is always 5.

I would've thought that at some of the runs, some threads would have gotten a local_copy before another thread updated self.value, leading to a race condition. Furthermore, doesn't this mean that the threads run synchronously since each thread waits for the previous thread to write self.value = local_copy ?

My thought process is that the OS has some process of identifying a read-write process of a shared attribute, and so assigns GILs to threads in a way that prevents any race condition from happening.

class FakeDatabase:
    def __init__(self):
        self.value = 0

    def update(self, name):
        logging.info("Thread %s: starting update", name)
        local_copy = self.value 
        local_copy  = 1
        self.value = local_copy
        logging.info("Thread %s: finishing update", name)

if __name__ == '__main__':
    logging.basicConfig(level=logging.INFO)
    database = FakeDatabase()
    with futures.ThreadPoolExecutor() as executor:
        for i in range(5):
            executor.submit(database.update, i)
    logging.info(f'Ending value is {database.value}')

CodePudding user response:

You only do one increment per thread, and the increment is a relatively short part, so it's likely that the increments don't overlap. If I make each thread increment a million times with

        for _ in range(10**6):
            local_copy = self.value
            local_copy  = 1
            self.value = local_copy

then I do see it happen (Try it online!):

INFO:root:Thread 0: starting update
INFO:root:Thread 1: starting update
INFO:root:Thread 2: starting update
INFO:root:Thread 3: starting update
INFO:root:Thread 4: starting update
INFO:root:Thread 4: finishing update
INFO:root:Thread 0: finishing update
INFO:root:Thread 3: finishing update
INFO:root:Thread 2: finishing update
INFO:root:Thread 1: finishing update
INFO:root:Ending value is 1745053

CodePudding user response:

Python's threading with CPU bound tasks like in your example case above behaves effectively in synchronous fashion under the hood, the real benefit of threads in Python is with I/O bound tasks like waiting for something from the network or disk.

Although pretty old, this is an excellent talk about this

  • Related