I am a bit confused on how best to do the following. I am not after the code, but rather what I should do and when.
I want to do the following
get a list of shops (model query)
create a pool for each shop (pool1, pool2, pool3)
for each shop get all the products
process each product by adding it to the pool, so pool1-product1,pool1-product2
The above shows that I have lots of shop and each shop has lots of products that need to be processed. I want the shops to be processed at the same time
I am confused on what is the best way to approach this
Any help would be appreciated
Thanks
Grant
CodePudding user response:
You are trying to parallelize I/O-bound tasks (tasks consisting mostly of network calls and other I/Os). You can either use a cooperative multitasking library such as AsyncIO of use a thread pool for that. Here is what it could look like using a thread pool:
from concurrent.futures import ThreadPoolExecutor
def process_product(product):
...
def process_shop(shop):
for product in shop.products:
process_product(product)
def main():
shops = [...]
with ThreadPoolExecutor() as executor:
executor.map(process_shop, shops)
Usually, you do not want to create a new thread for each shop as it could be expensive and inefficient, hence the use of a pool.
With AsyncIO, you can use the gather()
function to launch concurrent tasks. It would ressembles:
import asyncio
async def process_product(product):
...
async def process_shop(shop):
for product in shop.products:
await process_product(product)
async def main():
shops = [...]
await asyncio.gather(*[process_shop(shop) for shop in shops])
if __name__ == "__main__":
asyncio.run(main())