Home > other >  Question: in AMI 3.0.4 and c3.4 xlarge on each node set some parallel tasks
Question: in AMI 3.0.4 and c3.4 xlarge on each node set some parallel tasks

Time:10-03

Hello,
This problem before, but I tried all the methods are of no use,
I run a Pig on EMR script, the script is the result of about 500 mapper and 20 reducer, when I use c3.4 xlarge instance running script, I found on the same machine but only 2 mapper, I want to increase to 16 mapper,
I use Ruby CLI created the EMR cluster, here is my command:
Elastic - graphs -- create -- the name "Test" pig \
- visible - to - all - users \
- num - instances 3 \
- the bootstrap - action s3://elasticmapreduce/bootstrap - actions/install ganglia \
-- the bootstrap - action="s3://elasticmapreduce/bootstrap - actions/configure hadoop -" \
- the args "-m, mapred. Tasktracker. Map. The tasks. The maximum=16, -m, mapred. Tasktracker. Map. The tasks. The maximum=16, -m, mapred. Tasktracker. Reduce. The tasks. The maximum=16" \
- master - the instance - type "c3.4 xlarge" \
- slave - instance - type "c3.4 xlarge" \
- pig - script \
- the args s3://my - bucket/pig/myscript. Pig \
- pig - versions 0.11.1.1 - ami - version 3.0.4
I also tried to change the position of the task is stored, but without success, I made some set on Hadoop configuration (I checked through the UI these setting), but I still can see two tasks (2 mapper, one mapper and a reducer or two reducer) running, strangely, when I was testing with m1. Xlarge instance, when running on the result is a success,
Thank you very much for your help,

CodePudding user response:

Hello,
When you use the AMI 3 x, you are using the framework of Hadoop 2 YARN, and mapred. Tasktracker. Map. The tasks. The maximum style is only applicable to the framework of Hadoop, 1 in the frame of the EMR Hadoop on 2 YARN, YARN system control in the form of a more dynamic map/position of the reducer, so no said earlier position,
You can have a look at how the YARN described in the http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html introduction sets the position of resources and the influence of it on the map/reduce tasks,
You can also take a look at the following document:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html you can see the default value of each instance type and c3. Xlarge and m1 xlarge parallel tasks in different ways, the main difference is that in m1. Xlarge stored in the yarn. The nodemanager. Resource. The memory - MB need memory than in c3.4 xlarge in 1.5 times more,

CodePudding user response:

Thank you very much, I will modify the memory value of mappers/reducers,
In addition, I remember I only try graphs. The map. The Java. Opts, haven't try graphs. The map. The memory. MB, I will try to combine them to try,
  •  Tags:  
  • AWS
  • Related