Home > Back-end >  How to add .py files to spark-submit?
How to add .py files to spark-submit?

Time:06-08

I have a spark-submit with --py-files. I am facing issue while I run the script. I am getting an error -> ModuleNotFoundError: No module named 'apps' although the imports are done properly.

My code looks like :

spark_submit = '''
            /opt/spark/bin/spark-submit \
            --master spark://spark-master:7077 \
            --jars /opt/spark-jars/postgresql-42.2.22.jar \
            --driver-cores {} \
            --driver-memory {} \
            --executor-cores {} \
            --executor-memory {} \
            --num-executors {} \
            --py-files /opt/spark-apps/l2lpt/models/FBP.py,/opt/spark-apps/l2lpt/utils/exceptions.py,/opt/spark-apps/l2lpt/anomaly_detector/anomaly_detector.py,/opt/spark-apps/l2lpt/l2lpt.py \
            /opt/spark-apps/l2lpt/main.py {} {}
        '''.format(
            driver_cores,
            driver_memory,
            executor_cores,
            executor_memory,
            num_executors,
            offset,
            timestamp)

Does the sequence of the .py files added matter? I am unable to understand in which sequence these .py files need to be added? Do I need to add all the .py files that my main() function will be calling to?

CodePudding user response:

When you wanted to spark-submit a PySpark application (Spark with Python), you need to specify the .py file you wanted to run and specify the .egg file or .zip file for dependency libraries.

CodePudding user response:

#run below command from terminal or Pycharm terminal:

spark-submit --master local --deploy-mode client .\filename.py

  • Related