Home > Mobile >  Can I run pyspark locally without installing spark on windows 10?
Can I run pyspark locally without installing spark on windows 10?

Time:12-14

I need to create a proof of concept using pyspark and I was wondering if there is a way to install it and use it via pip without having to install and configure spark itself. I've read a few answers suggesting that the newer versions of pyspark allow you to run it in standalone mode without without needing the full spark but when I try that I get the following error:

Traceback (most recent call last):
  File "C:\Users\320181940\PycharmProjects\meetup\main.py", line 8, in <module>
    sc = SparkContext("local", "meetup_etl")
  File "C:\Users\320181940\PycharmProjects\meetup\venv\lib\site-packages\pyspark\context.py", line 144, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "C:\Users\320181940\PycharmProjects\meetup\venv\lib\site-packages\pyspark\context.py", line 331, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "C:\Users\320181940\PycharmProjects\meetup\venv\lib\site-packages\pyspark\java_gateway.py", line 101, in launch_gateway
    proc = Popen(command, **popen_kwargs)
  File "C:\Python310\lib\subprocess.py", line 966, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python310\lib\subprocess.py", line 1435, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

I installed pyspark 3.1.3 using pip, and I'm trying to run this on Windows 10. Any help would be much appreciated.

CodePudding user response:

You need to install java and add JAVA_HOME to your environment variables path

CodePudding user response:

Start a python interpreter, create a spark session and run your code, here's an example:

from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
df = spark.createDataFrame(
        [["I'm ready!"], ["If I could put into words how much I love waking up at 6 am on Mondays I would."]]).toDF(
        "text")
df.show()

Also make sure to set up HADOOP_HOME like it's specified in this gist

  • Related