Pyspark command not working -- how to configure pyspark on windows?-CodePudding

I have been looking up the solution to this problem for about 5 hours, so I am quite annoyed at this point.

In essence, I get a few warnings:

WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/01/24 00:20:03 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at:

In my system variables, I have:

HADOOP_HOME
C:\spark-3.2.0-bin-hadoop3.2

SPARK_HOME
C:\spark-3.2.0-bin-hadoop3.2

JAVA_HOME
C:\Program Files\Java\jdk-17.0.1

And within the system variable path I have %SPARK_HOME%\bin. I also have winutils in C:\spark-3.2.0-bin-hadoop3.2\bin, as it should be. The pyspark command on the command prompt should work, but the computer keeps giving the error which I will copy below. Thanks in advance for helping, as I know similar questions have been answered, but when I try those other answers, I keep getting errors. Don't quite know what is going on...

UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")

And the kiss of death

SUCCESS: The process with PID 33244 (child process of PID 12556) has been terminated.
SUCCESS: The process with PID 12556 (child process of PID 13404) has been terminated.

CodePudding user response：

From your input possible additional checks:

add env var to Python env used - here conda env named "spark"

PYSPARK_PYTHON="C:\Users<my user>\AppData\Local\Continuum\anaconda3\envs\spark\python.exe"
run Spark prompt with admin permissions (ie. cmd right-click "run as administrator"). Spark opens local ports which can be denied under normal user.

CodePudding user response：

you need winutils.exe in hadoop home folder for it to work.