Say you have a loop containing a varying number of iterations and 4 cores
I understand that
#pragma omp parallel for
will basically divide the iterations in like this with chunks of size/4 length
| T1 | T2 | T3 | T4 |
However, in my particular situation, this behavior would be more advantageous. Where each chunk is size/size length. So thread 1 would not get iterations 0..size/4, but instead iterations 0,size/4,2*size/4,3*size/4
|T1|T2|T3|T4|T1|T2|T3|T4|T1|T2|T3|T4|T1|T2|T3|T4|
How can I have my code execute like this when the number of iterations is not known until runtime?
CodePudding user response:
What you are describing -- assuming that your heuristic is size/total threads -- is a round-robin scheduling (i.e., static scheduling) with chunk_size = 1. For that you simply need :
#pragma omp parallel for schedule(static,1)
In this case, it makes no difference if the number of iterations is known (or not) at runtime.