Spark deployment problem-CodePudding

Scene: three spark streaming operation (call a, b, c), homework a receiving from kafka's original log, after processing, with two different topic to kafka, assignments and homework c b respectively from kafka, receive the two different topic,
Machine: three virtual machines, 4 core, memory 10 g (installed cdh5.3.3, installation is memory of 4 g, then enlarge to 10 g)
Question:
(1) for local deployment model, the three assignments can run normally, can receive kafka, and processing,

/usr/lib/spark - 1.3.1 - bin - hadoop2.4/bin/spark - submit - class com. - jars XXX XXX. The jar - master the local [2], the conf spark. The UI. The port=4042 - executor - memory 768 m yyy jar
(all three assignments by the above command run on three machines respectively)

(2) for standalone deployment model, a receiving data, can be used for production and success to kafka, but no processing data assignments and homework c b , assignments and homework c b have received kafka message unknown

/usr/lib/spark - 1.3.1 - bin - hadoop2.4/bin/spark - submit - class com. - jars XXX XXX. The jar - master spark://hdp5:7077 - the conf spark. The UI. The port=4042 - executor - memory 1 g - total - executor - cores 3 yyy. Jar
(all three assignments by the above command run on three machines respectively)

Check the hdp5:8080

Standalone resources lead to enough or not, I feel as an input dstream will occupy a nuclear, but the feeling of auditing is enough ah,

, three (3) the deployment model for yarn - cluster operations through the spark - submit after the run, only a job is running state, the other two have been accepted?

CodePudding user response:

The spark - default. Conf

 
Spark. The serializer org. Apache. Spark. The serializer. KryoSerializer 
Spark. Driver. The memory of 768 m 
Spark. Executor. 2 g memory 
Spark. Streaming. Unpersist true 
Spark. Streaming. Receiver. WriteAheadLog. Enable false 
Spark. Local. Dir/data1/data/spark and/data2/data/spark,/data3/data/spark 
Spark. Default. Parallelism 9 
The spark. Shuffle. ConsolidateFiles true 
The spark. Storage. MemoryFraction 0.6 
Spark. Driver. ExtraJavaOptions - XX: XX: + DisableExplicitGC - + UseConcMarkSweepGC - XX: XX: + CMSParallelRemarkEnabled - + UseCMSCompactAtFullCollection - XX: XX: + UseCMSInitiatingOccupancyOnly CMSInitiatingOccupancyFraction=70 
Spark. Executor. ExtraJavaOptions - XX: XX: + DisableExplicitGC - + UseConcMarkSweepGC - XX: XX: + CMSParallelRemarkEnabled - + UseCMSCompactAtFullCollection - XX: XX: + UseCMSInitiatingOccupancyOnly CMSInitiatingOccupancyFraction=70

The spark - env. Sh

 
Export JAVA_HOME=/usr/Java/jdk1.7.0 _67 
Export SCALA_HOME=/usr/lib/scala - 2.11.4 
Export SPARK_MASTER_IP=hdp5 
# export SPARK_WORKER_MEMORY=768 m 
SPARK_WORKER_DIR=/data3/logs/spark 
Export HADOOP_CONF_DIR=/etc/alternatives/hadoop - conf 

Export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/usr/lib/hadoop/lib/native

CodePudding user response:

There's no more spark streaming operation at the same time run the scene, how to deploy this side also difficult, dizzy dead

CodePudding user response:

Configuration problem, do you have a limited number change execotor start,

CodePudding user response:

The first task is not specified executor - cores

CodePudding user response:

- total - executor - cores 3 is too little? You are a total of 12 cores, fully open bai