Home > front end >  pyspark setup issues on anaconda
pyspark setup issues on anaconda

Time:01-17

Trying to learn to use pyspark with jupyter notebooks. created an env for pyspark and installed it in anaconda, python version is 3.10.8 and java version in the env is:

openjdk 17.0.3 2022-04-19 LTS
OpenJDK Runtime Environment Zulu17.34 19-CA (build 17.0.3 7-LTS)
OpenJDK 64-Bit Server VM Zulu17.34 19-CA (build 17.0.3 7-LTS, mixed mode, sharing)

when opening jupyter labs and trying to run my first spark session i've used:

import pyspark 
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("JupNote").getOrCreate()

and get the error:

Py4JJavaError                             Traceback (most recent call last)
c:\Users\frezanlutu\Skills_Training\BigData\pyspark.ipynb Cell 3 in <cell line: 1>()
----> 1 spark = SparkSession.builder.appName("JupNote").getOrCreate()

File c:\Users\frezanlutu\.conda\envs\pyspark-env\lib\site-packages\pyspark\sql\session.py:228, in SparkSession.Builder.getOrCreate(self)
    226         sparkConf.set(key, value)
    227     # This SparkContext may be an existing one.
--> 228     sc = SparkContext.getOrCreate(sparkConf)
    229 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    230 # by all sessions.
    231 session = SparkSession(sc)

File c:\Users\frezanlutu\.conda\envs\pyspark-env\lib\site-packages\pyspark\context.py:392, in SparkContext.getOrCreate(cls, conf)
    390 with SparkContext._lock:
    391     if SparkContext._active_spark_context is None:
--> 392         SparkContext(conf=conf or SparkConf())
    393     return SparkContext._active_spark_context

File c:\Users\frezanlutu\.conda\envs\pyspark-env\lib\site-packages\pyspark\context.py:146, in SparkContext.__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    145 try:
--> 146     self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
    147                   conf, jsc, profiler_cls)
    148 except:
    149     # If an error occurs, clean up in order to allow future SparkContext creation:
    150     self.stop()

File c:\Users\frezanlutu\.conda\envs\pyspark-env\lib\site-packages\pyspark\context.py:209, in SparkContext._do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
    206 self.environment["PYTHONHASHSEED"] = os.environ.get("PYTHONHASHSEED", "0")
    208 # Create the Java SparkContext through Py4J
--> 209 self._jsc = jsc or self._initialize_context(self._conf._jconf)
    210 # Reset the SparkConf to the one actually used by the SparkContext in JVM.
    211 self._conf = SparkConf(_jconf=self._jsc.sc().conf())

File c:\Users\frezanlutu\.conda\envs\pyspark-env\lib\site-packages\pyspark\context.py:329, in SparkContext._initialize_context(self, jconf)
    325 def _initialize_context(self, jconf):
    326     """
    327     Initialize SparkContext in function to allow subclass specific initialization
    328     """
--> 329     return self._jvm.JavaSparkContext(jconf)

File c:\Users\frezanlutu\.conda\envs\pyspark-env\lib\site-packages\py4j\java_gateway.py:1585, in JavaClass.__call__(self, *args)
   1579 command = proto.CONSTRUCTOR_COMMAND_NAME  \
   1580     self._command_header  \
   1581     args_command  \
   1582     proto.END_COMMAND_PART
   1584 answer = self._gateway_client.send_command(command)
-> 1585 return_value = get_return_value(
   1586     answer, self._gateway_client, None, self._fqn)
   1588 for temp_arg in temp_args:
   1589     temp_arg._detach()

File c:\Users\frezanlutu\.conda\envs\pyspark-env\lib\site-packages\py4j\protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
    at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
    at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
    at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:833)

i've also tried:

spark = SparkSession.builder.config("spark.driver.host", "localhost").appName("JupNote").getOrCreate()

after looking for some solutions but that produces the same error. anyone know if i'm missing anything or doing anything wrong?

CodePudding user response:

Which spark version are you using? If you are using Spark < 3.3.0, it only support Java 8. You may check https://spark.apache.org/docs/3.3.0/#downloading

  • Related