Home > OS >  spark.shuffle.service.enabled=true cluster.YarnScheduler: Initial job has not accepted any resources
spark.shuffle.service.enabled=true cluster.YarnScheduler: Initial job has not accepted any resources

Time:02-17

I am trying to run a pyspark job using yarn with the spark.shuffle.service.enabled=true option but the job never completes :

Without the option, the job works well:

user@e7524bf7f996:~$ pyspark --master yarn                                                               
Using Python version 3.9.7 (default, Sep 16 2021 13:09:58)
Spark context Web UI available at http://e7524bf7f996:4040
Spark context available as 'sc' (master = yarn, app id = application_1644937120225_0004).
SparkSession available as 'spark'.
>>> sc.parallelize(range(10)).sum()
45       

With the option --conf spark.shuffle.service.enabled=true

user@e7524bf7f996:~$ pyspark --master yarn --conf spark.shuffle.service.enabled=true
Using Python version 3.9.7 (default, Sep 16 2021 13:09:58)
Spark context Web UI available at http://e7524bf7f996:4040
Spark context available as 'sc' (master = yarn, app id = application_1644937120225_0005).
SparkSession available as 'spark'.
>>> sc.parallelize(range(10)).sum()
2022-02-15 15:10:14,591 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-15 15:10:29,590 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-15 15:10:44,591 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Are there other options in Spark or Yarn that should be enabled to make spark.shuffle.service.enabled work ?

I am running Spark 3.1.2 , Python 3.9.7, hadoop-3.2.1

Thank you,

Bertrand

CodePudding user response:

You need to configure external shuffle service on Yarn cluster by following

  1. Build Spark with the YARN profile. Skip this step if you are using a pre-packaged distribution.
  2. Locate the spark-<version>-yarn-shuffle.jar. This should be under $SPARK_HOME/common/network-yarn/target/scala- if you are building Spark yourself, and under yarn if you are using a distribution.
  3. Add this jar to the classpath of all NodeManagers in your cluster.
  4. In the yarn-site.xml on each node, add spark_shuffle to yarn.nodemanager.aux-services, then set yarn.nodemanager.aux-services.spark_shuffle.class to org.apache.spark.network.yarn.YarnShuffleService.
  5. Increase NodeManager's heap size by setting YARN_HEAPSIZE (1000 by default) in etc/hadoop/yarn-env.sh to avoid garbage collection issues during shuffle.
  6. Restart all NodeManagers in your cluster.

For details, please refer https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service

If still not working, check below:

  1. Check Yarn UI to ensure enough resources available.
  2. Try --deploy-mode cluster to ensure driver could communicate with yarn cluster for scheduling

CodePudding user response:

Thanks Warren for your help.

Here is the setup working for me:

https://github.com/BertrandBrelier/SparkYarn/blob/main/yarn-site.xml

echo "export YARN_HEAPSIZE=2000" >> /home/user/hadoop-3.2.1/etc/hadoop/yarn-env.sh

ln -s /home/user/spark-3.1.2-bin-hadoop3.2/yarn/spark-3.1.2-yarn-shuffle.jar /home/user/hadoop-3.2.1/share/hadoop/yarn/lib/.

echo "spark.shuffle.service.enabled    true" >> /home/user/spark-3.1.2-bin-hadoop3.2/conf/spark-defaults.conf

restarting Hadoop and Spark

I was able to start a pyspark session:

pyspark --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true
  • Related