I have installed Spark 3.3.1
and it was running previously with both spark-shell
and pyspark
commands. But after I installed Hadoop 3.3.1
it seems that pyspark
command doesn't work properly and this is the result of running that:
C:\Users\A>pyspark2 --num-executors 4 --executor-memory 1g
[I 2022-11-20 22:36:09.100 LabApp] JupyterLab extension loaded from C:\Users\A\AppData\Local\Programs\Python\Python311\Lib\site-packages\jupyterlab
[I 2022-11-20 22:36:09.100 LabApp] JupyterLab application directory is C:\Users\A\AppData\Local\Programs\Python\Python311\share\jupyter\lab
[I 22:36:09.107 NotebookApp] Serving notebooks from local directory: C:\Users\A
[I 22:36:09.107 NotebookApp] Jupyter Notebook 6.5.2 is running at:
[I 22:36:09.107 NotebookApp] http://localhost:8888/?token=0fca9f0378976c7af19886970c9e801ac27a8d1a209528db
[I 22:36:09.108 NotebookApp] or http://127.0.0.1:8888/?token=0fca9f0378976c7af19886970c9e801ac27a8d1a209528db
[I 22:36:09.108 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 22:36:09.189 NotebookApp]
To access the notebook, open this file in a browser:
file:///C:/Users/A/AppData/Roaming/jupyter/runtime/nbserver-8328-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=0fca9f0378976c7af19886970c9e801ac27a8d1a209528db
or http://127.0.0.1:8888/?token=0fca9f0378976c7af19886970c9e801ac27a8d1a209528db
0.01s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
It opens the Jupyter notebook
but the Spark logo doesn't shown and Python shell wouldn't be available as before in CMD
. But spark-shell
still works as below:
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://168.150.8.52:4040
Spark context available as 'sc' (master = local[*], app id = local-1669062477403).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.16.1)
Type in expressions to have them evaluated.
Type :help for more information.
scala> 22/11/21 12:28:12 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
scala>
CodePudding user response:
Your path has been altered to use sparks python distribution. You can learn more about this here.
Try:
echo $PATH
And look for how many pythons you have. I bet you have more than 1.
CodePudding user response:
It opens the Jupyter notebook but the Spark logo doesn't shown and Python shell wouldn't be available
Jupyter is a Python shell (by default).
Spark doesn't come with a pyspark2
command, so it seems you've done some customizing to your environment. Also, it'll only default to open Jupyter if you set a specific environment variable to do so.
The logo isn't necessary to tell you it's working. Try creating a session
from pyspark.sql import SparkSession
spark = SparkSession.appName("test").getOrCreate()