I have been looking up the solution to this problem for about 5 hours, so I am quite annoyed at this point.
In essence, I get a few warnings:
WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/01/24 00:20:03 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at:
In my system variables, I have:
HADOOP_HOME
C:\spark-3.2.0-bin-hadoop3.2
SPARK_HOME
C:\spark-3.2.0-bin-hadoop3.2
JAVA_HOME
C:\Program Files\Java\jdk-17.0.1
And within the system variable path I have %SPARK_HOME%\bin
. I also have winutils in C:\spark-3.2.0-bin-hadoop3.2\bin
, as it should be. The pyspark command on the command prompt should work, but the computer keeps giving the error which I will copy below. Thanks in advance for helping, as I know similar questions have been answered, but when I try those other answers, I keep getting errors. Don't quite know what is going on...
UserWarning: Failed to initialize Spark session.
warnings.warn("Failed to initialize Spark session.")
And the kiss of death
SUCCESS: The process with PID 33244 (child process of PID 12556) has been terminated.
SUCCESS: The process with PID 12556 (child process of PID 13404) has been terminated.
CodePudding user response:
From your input possible additional checks:
add env var to Python env used - here conda env named "spark"
PYSPARK_PYTHON="C:\Users<my user>\AppData\Local\Continuum\anaconda3\envs\spark\python.exe"
run Spark prompt with admin permissions (ie. cmd right-click "run as administrator"). Spark opens local ports which can be denied under normal user.
CodePudding user response:
you need winutils.exe in hadoop home folder for it to work.