The Problem
When I want to import pyspark in a python script in PyCharm I get below error (cannot import x from y). I checked the directory and the module that should be imported is not present.
Thats all there is in pyspark\cloudpickle\
Why is it not installed? What could be possible problems?
What I tried
Compatibility issues? I found this which looks similar, but my error says "cannot import name
"
I also found this about cloudpickle specifically, I tried with cloudpickle=1.1.1 but it didn't work for me.
I also made a new env, re-installed pyspark and rebooted, but it didn't help.
import findspark
findspark.init()
Works without error.
Obviously I'm new to Spark/PySpark and might miss the obvious...
Error
import pyspark
Traceback (most recent call last):
File "C:\Users\me\anaconda3\envs\myenv\lib\site-packages\IPython\core\interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-9-d008122bb79d>", line 3, in <cell line: 3>
from pyspark.sql import Row
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.3.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\me\anaconda3\envs\myenv\lib\site-packages\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.3.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\me\anaconda3\envs\myenv\lib\site-packages\pyspark\context.py", line 33, in <module>
from pyspark.broadcast import Broadcast, BroadcastPickleRegistry
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.3.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\me\anaconda3\envs\myenv\lib\site-packages\pyspark\broadcast.py", line 25, in <module>
from pyspark.cloudpickle import print_exec
ImportError: cannot import name 'print_exec' from 'pyspark.cloudpickle' (C:\Users\me\anaconda3\envs\myenv\lib\site-packages\pyspark\cloudpickle\__init__.py)
Specs
I am working in PyCharm IDE (PyCharm Community Edition 2021.3.1)
Python 3.10.4 | packaged by conda-forge | (main, Mar 30 2022, 08:38:02) [MSC v.1916 64 bit (AMD64)]
>conda list | grep pyspark
pyspark 3.2.1
>conda info
conda version : 4.12.0
conda-build version : 3.20.5
python version : 3.8.5.final.0
CodePudding user response:
Check to see if you have python lib in your path
CodePudding user response:
Since the files were not there, I just downloaded pyspark manually from the website and replaced the previos pyspark installation with the newly downloaded one.
This got rid of all the import errors.