Home > other >  Spark deployment problem
Spark deployment problem

Time:09-25

Scene: three spark streaming operation (call a, b, c), homework a receiving from kafka's original log, after processing, with two different topic to kafka, assignments and homework c b respectively from kafka, receive the two different topic,
Machine: three virtual machines, 4 core, memory 10 g (installed cdh5.3.3, installation is memory of 4 g, then enlarge to 10 g)
Question:
(1) for local deployment model, the three assignments can run normally, can receive kafka, and processing,

/usr/lib/spark - 1.3.1 - bin - hadoop2.4/bin/spark - submit - class com. - jars XXX XXX. The jar - master the local [2], the conf spark. The UI. The port=4042 - executor - memory 768 m yyy jar
(all three assignments by the above command run on three machines respectively)

(2) for standalone deployment model, a receiving data, can be used for production and success to kafka, but no processing data assignments and homework c b , assignments and homework c b have received kafka message unknown

/usr/lib/spark - 1.3.1 - bin - hadoop2.4/bin/spark - submit - class com. - jars XXX XXX. The jar - master spark://hdp5:7077 - the conf spark. The UI. The port=4042 - executor - memory 1 g - total - executor - cores 3 yyy. Jar
(all three assignments by the above command run on three machines respectively)

Check the hdp5:8080


Standalone resources lead to enough or not, I feel as an input dstream will occupy a nuclear, but the feeling of auditing is enough ah,

, three (3) the deployment model for yarn - cluster operations through the spark - submit after the run, only a job is running state, the other two have been accepted?

CodePudding user response:

The spark - default. Conf
 
Spark. The serializer org. Apache. Spark. The serializer. KryoSerializer
Spark. Driver. The memory of 768 m
Spark. Executor. 2 g memory
Spark. Streaming. Unpersist true
Spark. Streaming. Receiver. WriteAheadLog. Enable false
Spark. Local. Dir/data1/data/spark and/data2/data/spark,/data3/data/spark
Spark. Default. Parallelism 9
The spark. Shuffle. ConsolidateFiles true
The spark. Storage. MemoryFraction 0.6
Spark. Driver. ExtraJavaOptions - XX: XX: + DisableExplicitGC - + UseConcMarkSweepGC - XX: XX: + CMSParallelRemarkEnabled - + UseCMSCompactAtFullCollection - XX: XX: + UseCMSInitiatingOccupancyOnly CMSInitiatingOccupancyFraction=70
Spark. Executor. ExtraJavaOptions - XX: XX: + DisableExplicitGC - + UseConcMarkSweepGC - XX: XX: + CMSParallelRemarkEnabled - + UseCMSCompactAtFullCollection - XX: XX: + UseCMSInitiatingOccupancyOnly CMSInitiatingOccupancyFraction=70


The spark - env. Sh
 
Export JAVA_HOME=/usr/Java/jdk1.7.0 _67
Export SCALA_HOME=/usr/lib/scala - 2.11.4
Export SPARK_MASTER_IP=hdp5
# export SPARK_WORKER_MEMORY=768 m
SPARK_WORKER_DIR=/data3/logs/spark
Export HADOOP_CONF_DIR=/etc/alternatives/hadoop - conf

Export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/usr/lib/hadoop/lib/native


CodePudding user response:

There's no more spark streaming operation at the same time run the scene, how to deploy this side also difficult, dizzy dead

CodePudding user response:

Configuration problem, do you have a limited number change execotor start,

CodePudding user response:

The first task is not specified executor - cores

CodePudding user response:

- total - executor - cores 3 is too little? You are a total of 12 cores, fully open bai
  • Related