Home > Enterprise >  Starting a pyspark session take a lot of time
Starting a pyspark session take a lot of time

Time:05-03

Hello I am new with pyspark, and I'm stuck with this line of code:

spark = SparkSession.builder.appName('HelloWorld').getOrCreate()

The launching of the spark session won't end up, i've waited for more than 100 min and nothing, it's still compiling. Can anyone explain to me how to resolve this problem.

CodePudding user response:

Try providing master details and see if it helps. It appears that your spark session is unable to locate the master demon

spark = SparkSession.builder.master("local").appName("test").getOrCreate()

CodePudding user response:

As suggested in the other answer/comment there might be an issue with reaching the Spark server. If you can start a session with master('local') then that's certainly the issue.

If you are connecting to a remote Spark server there might be issues there for instance with lack of available resources, so you will need to contact the server's administrator.

Set logging level to DEBUG

To find out what's going on you can increase the debug level. First you need to locate the logging (log4j) configuration file:

import os
print(os.environ['SPARK_HOME'])

The file is called log4j.properties and should be found in the conf subfolder of $SPARK_HOME:

os.path.join(os.environ['SPARK_HOME'], 'conf')

If there's no file log4j.properties in conf there should be a log4j.properties.template. Copy the template to log4j.properties and make sure that it contains these lines (the relevant one is log4j.rootCategory=DEBUG, console):

# Set everything to be logged to the console
log4j.rootCategory=DEBUG, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err

Start a new shell or pyspark and see what messages you get when attempting to start a Spark session.

  • Related