Hadoop platform building-CodePudding

Hadoop download address:
http://www.apache.org/dyn/closer.cgi/hadoop/core/
Version: hadoop0.17.1

The JDK installation:
Requirements must be installed above the jdk1.5.07 version,

Step by step environment construction:
1, the hardware environment
We have adopted three machines to build, install the Red Hat 4.1.2-42 system, and there is a called "mingjie" account, as follows:
Host name: hdfs1 IP: 192.168.0.221 function: the NameNode, JobTracker
Host name: hdfs2 IP: 192.168.0.227 function: the DataNode, TaskTracker
Host name: hdfs3 IP: 192.168.0.228 function: the DataNode, TaskTracker

Key: modify the/etc/hosts three machines, let each other's host name and IP can smooth analytic
127.0.0.1 localhost
192.168.0.37 hdfs1
192.168.0.43 hdfs2
192.168.0.53 hdfs3
2, each machine to install the Java environment, our paths for unified "/opt/modules/jdk1.6", and added to the system environment variables sudo vi/etc/profile

JAVA_HOME=/opt/modules/jdk1.6
PATH=$JAVA_HOME/bin: $PATH: $the CATALINA_HOME/bin
The CLASSPATH=$JAVA_HOME/jre/lib/rt. The jar: $JAVA_HOME/lib/view jar
Export JAVA_HOME
3, download hadoop0.17.1, extract the hadoop to/home/mingjie, preferably also add hadoop directory to the environment variables to:

HADOOP_HOME=/home/mingjie/hadoop - 0.17.1 # this is the home directory hadoop
Export HADOOP_HOME
HADOOP_CONF_DIR=$HADOOP_HOME/conf # this is hadoop configuration file directory
Export HADOOP_CONF_DIR
HADOOP_LOG_DIR=/home/mingjie/hadoop - 0.17.1/log # deposit run log directory
Export HADOOP_LOG_DIR
The export PATH=$PATH: $HADOOP_HOME/bin

4, installation of SSH, and generate the public and private keys
? Running SSH - the keygen -t rsa, according to the screen prompt directly select "enter"
? In the user directory ~/. SSH/produce two files, id_rsa, id_rsa.pub
? The cat ~/. SSH/id_dsa. Pub & gt;> ~/. SSH/authorized_keys
After completion of the above configuration, perform the SSH localhsot, confirm you each machine can use SSH

5, add the contents of authorized_keys on the master server to slave two machines in the authorized_keys file, let master can also don't need a password to access two slave server,

Sudo SCP authorized_keys hdfs2:/home/mingjie/. SSH/
Sudo SCP authorized_keys hdfs3:/home/mingjie/. SSH/
SSH hdfs2
SSH hdfs3
6, next, we need to modify the hadoop conf/masters, the conf/slaves these 2 files:
The Master Settings (& lt; HADOOP_HOME & gt;/conf/masters) : hdfs1
Slave Settings (& lt; HADOOP_HOME & gt;/conf/slaves) : hdfs2 hdfs3
7, modify/conf/hadoop - env. Sh:
Export JAVA_HOME=/opt/jdk1.6.0 _03
8, modify/conf/hadoop - site. XML, these are only a few commonly used the properties of the configuration, the tuning of the performance of hadoop, need to study the hadoop - default. XML:

Fs. Default. Name//your namenode configuration, the name of the machine and port
HDFS://hdfs1:54310/& lt;/value>

Mapred. Job. Tracker//your JobTracker configuration, the name of the machine and port
HDFS://hdfs1:54311 & lt;/value>

DFS. Replication//the number of data need to backup, the default is 3
1

Hadoop. TMP. Dir//Hadoop default temporary path, the best configuration, if the new node or other situations puzzling DataNode can't start, just delete the files in the TMP directory, but if you remove the NameNode machine this directory, so you need to perform the NameNode format commands,
/home/mingjie/hadoop - 0.17.1/TMP/& lt;/value>

Mapred. Child. Java. Opts//some parameters of the Java virtual machine can consult configuration
- Xmx512m & lt;/value>

DFS. Block. Size//the size of the block, unit of byte, below mentioned use, must be in multiples of 512 for the CRC file integrity check, the default configuration is the smallest unit of checksum, 512
5120000 & lt;/value>
The default block size for new files. & lt;/description>

9, then copy the overall environment of the hadoop to hdfs2, hdfs3 above to
SCP - r/home/mingjie/hadoop. 0.17.1 hdfs2:/home/mingjie/hadoop 0.17.1
SCP - r/home/mingjie/hadoop. 0.17.1 hdfs3:/home/mingjie/hadoop 0.17.1
10 and above the namenode hdfs1 HDFS formatting a new distributed file system, is the hadoop - site. Hadoop) specified in the XML file. The TMP. Dir path

To the overall completes the hadoop environment deployed
Start the hadoop: & lt; HADOOP_HOME & gt;/bin/start - all. Sh
Stop the hadoop: & lt; HADOOP_HOME & gt;/bin/stop - all. Sh

Description:
(1) to start the process of Hadoop,
? On the master server will start 3 Java process, respectively the NameNode, SecondNameNode, JobTracker, can produce two files in the LOG directory, corresponding to the NameNode run LOG and the JobTracker running LOG,
? In the slave server will start two Java process, DataNode respectively, and the TaskTracker, and can produce two files in the LOG directory, the operation of the corresponding to the DataNode LOG and TaskTracker running LOG, start by looking at the LOG analysis of hadoop is correct,
(2) through the IE browse distributed file system files in the
? Visit http://hdfs1:50030 to view the JobTracker running state,
? Visit http://360quan-1:50060 to view TaskTracker running state,
? Visit http://360quan-1:50070 to view the NameNode and the state of the distributed file system,

CodePudding user response:

The building Lord hard, that's enough

CodePudding user response:

Very good, support handling:)

CodePudding user response:

Very good, support for handling!

CodePudding user response:

Thank you, to learn