I have a spark-submit with --py-files. I am facing issue while I run the script. I am getting an error -> ModuleNotFoundError: No module named 'apps'
although the imports are done properly.
My code looks like :
spark_submit = '''
/opt/spark/bin/spark-submit \
--master spark://spark-master:7077 \
--jars /opt/spark-jars/postgresql-42.2.22.jar \
--driver-cores {} \
--driver-memory {} \
--executor-cores {} \
--executor-memory {} \
--num-executors {} \
--py-files /opt/spark-apps/l2lpt/models/FBP.py,/opt/spark-apps/l2lpt/utils/exceptions.py,/opt/spark-apps/l2lpt/anomaly_detector/anomaly_detector.py,/opt/spark-apps/l2lpt/l2lpt.py \
/opt/spark-apps/l2lpt/main.py {} {}
'''.format(
driver_cores,
driver_memory,
executor_cores,
executor_memory,
num_executors,
offset,
timestamp)
Does the sequence of the .py files added matter? I am unable to understand in which sequence these .py files need to be added? Do I need to add all the .py files that my main() function will be calling to?
CodePudding user response:
When you wanted to spark-submit a PySpark application (Spark with Python), you need to specify the .py file you wanted to run and specify the .egg file or .zip file for dependency libraries.
CodePudding user response:
#run below command from terminal or Pycharm terminal:
spark-submit --master local --deploy-mode client .\filename.py