At first I am sorry for asking this, I still new in hadoop, I have a question that I am afraid will be happened in the future because of my bad configuration, hadoop is strong tools but I think it is useless if I cant use it properly
Is there any possibility that mapreduce in hadoop will be assigned to mapper only because the resource cant be assigned to reducer?
For example if I have a lot of input, assume my machine yarn limit memory is 10gb, mapper max memory is 1gb, reducer max memory is 4gb. Assume there is no problem with vcpu and I have a lot of jobs/splits
Then there is possibility that all my 10gb is being used by 10 mappers and reducer cannot going in because when a mapper is finished, it dont have enough resource for reducer container to fit in, and will let another mapper to run there. And in the end the cluster only process mapper and all jobs will be stuck because no reducer can be run. (actually it still not happened to me, maybe because my input is limited)
Can anyone give some recommended settings or tips to avoid that?
Thank you, I am sorry if I have wrong concept about how hadoop schedule the container
CodePudding user response:
Yes, it is possible for resources to get stuck in a waiting state unless preemption is enabled and you setup YARN queue capacity.
There are only two built-in schedulers. Read the docs to see which suits your needs.