I'm currently trying to understand the resource allocation within a cloudera cluster. In our organization we use the FairScheduler (https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/FairScheduler.html) and I'm not sure if i understand the FAIR policy correctly.
To summarize what I understood so far.
FIFO: Every job gets all resources it needs, since all resources are allocated. From this point the applications have to wait for free resources and will be executed in the in the same order as they arrived.
FAIR: Every job gets a fair share of the resources. If only 1 job arrives it gets all the available resources. If 2 Jobs arrive each job gets 1/2 of the resources.
But what happened if job 1 needs only 25% whereas job 2 needs 75%. Will this be a problem (1 gets 25% but 2 gets 50%)? Or will this be solved with max-min fairness?
DRF: Seeks to maximize the smallest dominant share in the system, then the second-smallest, and so on. (I know it's more complex but my question relates more to the FAIR policy)
CodePudding user response:
Your example doesn't really create contention and therefore wouldn't really illustrate the difference between FIFO and FAIR. Scheduler policies only really come into play when a request for resource that exceed the existing capacity (resource contention). (So your example of u and % would run as is and there wouldn't be an issue. You wouldn't see a difference in handling between FIFO and FAIR)
When a third job submitted to the same queue that is when the policy would try to enforce "fair resourcing" of 3. How aggressively yarn enforces the policy and how it impacted the job would depend on your settings but there would be an effort to re-allocate the resources 'fairly'.
This video helps illustrate this topic and is worth a watch.
Here's another post (I didn't create) that also does a good job of explaining different scheduling strategies.