We have a parallel scheduling system which works this way
There are N parallel processes ( or agents ) that each compute 1 task at a time till its completion without any preemption
We have an event log that has the start & end times of each tasks. Tasks are grouped into jobs and we have several jobs running in parallel. Jobs don't depend on each other and tasks within a job can ALL run in parallel at the same time
What happens is that, if we increase the number of jobs, then the net parallelism of a job is reduced or rather, major chunk of tasks of a job get executed nearly serially
e.g. IF I have 10 processes and 10 jobs with 10 tasks each, then each job will run on just 1 process effectively making all the jobs run serially
I want to quantify this information, and I am not sure what is the best way to do so. Perhaps what I want is to quantify resource contention
Some ideas that were in my mind was
1. Create a graph of tasks per job
There is a directed edge between two tasks T1 and T2 if T1.end < T2.start
Then, the amount of tasks that were executed serially would be the longest path in this directed graph
Do you see any concern with this approach. Is there a better way to model this?
2. Time to start a task
For each task, I calculate task.start - job.start
. But then again, what information is this data providing?
CodePudding user response:
A simple metric: calculate how long it would have taken if every job had run in ||el ( i.e. there were an infinite number of processors ). This is the "perfect time" Subtract "perfect" from how long it actually took to complete every job. Divide by the "perfect" time and express as a per cent. This is a metric of how much contention occurred.
First, let's look at the perfect job completion time: since all the tasks can run in parallel this is the maximum of the task durations in the job.
If all the jobs start at the same time, then the perfect time for ALL jobs is simply the longest perfect job time.
If the jobs start at different times, then you have to calculate the maximum value of job start time job perfect time.
Perfect job time job j
PJT(j) = MAX( td(t) ) for all tasks in j
Perfect time
PT = MAX( starttime(j) PJT(j) ) for all j
Contention metric
C = 100 * (actual time of completion of all jobs - PT ) / PT
Basic example:
3 jobs 1 task each duration 10 secs
2 processors available
Actual time is 20 secs
Perfect time is 10 secs
Contention is 100%
4 jobs 1 task each duration 10 secs
2 processors available
Actual time is 20 secs
Perfect time is 10 secs
Contention is 100%
Task Delayed Metric
TDM = 100 * number of tasks delayed / total number of tasks
Basic example:
3 jobs 1 task each duration 10 secs
2 processors available
1 task is delayed
TDM is 33%
4 jobs 1 task each duration 10 secs
2 processors available
2 tasks delayed
TDM is 50%
Metric comparison
The contention metric gives a measure of the total delay be all the jobs considered as one. The TDM gives a better measure of the delays experienced by individual tasks. Which one is used depends on what you are most interested in: overall delay or individual task delay.