Home > Software engineering >  Best way to manage multiple threads and multiple jobs
Best way to manage multiple threads and multiple jobs

Time:07-27

I am a bit confused on how best to do the following. I am not after the code, but rather what I should do and when.

I want to do the following

get a list of shops (model query)
    create a pool for each shop (pool1, pool2, pool3)
    for each shop get all the products
        process each product by adding it to the pool, so pool1-product1,pool1-product2 

The above shows that I have lots of shop and each shop has lots of products that need to be processed. I want the shops to be processed at the same time

I am confused on what is the best way to approach this

Any help would be appreciated

Thanks

Grant

CodePudding user response:

You are trying to parallelize I/O-bound tasks (tasks consisting mostly of network calls and other I/Os). You can either use a cooperative multitasking library such as AsyncIO of use a thread pool for that. Here is what it could look like using a thread pool:

from concurrent.futures import ThreadPoolExecutor

def process_product(product):
  ...

def process_shop(shop):
  for product in shop.products:
    process_product(product)

def main():
  shops = [...]

  with ThreadPoolExecutor() as executor:
    executor.map(process_shop, shops)

Usually, you do not want to create a new thread for each shop as it could be expensive and inefficient, hence the use of a pool.

With AsyncIO, you can use the gather() function to launch concurrent tasks. It would ressembles:

import asyncio

async def process_product(product):
  ...

async def process_shop(shop):
  for product in shop.products:
    await process_product(product)

async def main():
  shops = [...]

  await asyncio.gather(*[process_shop(shop) for shop in shops])
  

if __name__ == "__main__":
  asyncio.run(main())
  • Related