I was trying to run a spark code named "WorstMoviesSpark" on hdfs using putty.
spark-submit WorstMoviesSpark.py
But when I typed the code above, it returned an error:
python: can't open file '/home/maria_dev/WorstMoviesSpark.py': [Errno 2] No such file or directory
So I typed:
hdfs dfs -ls
and the result was
Found 11 items
drwxr-xr-x - maria_dev hdfs 0 2021-09-27 18:50 .Trash
drwx------ - maria_dev hdfs 0 2021-09-27 14:41 .staging
-rw-r--r-- 1 admin hdfs 1188 2021-09-27 18:52 WorstMoviesSpark.py
drwxr-xr-x - maria_dev hdfs 0 2021-09-27 14:41 best_genre
drwxr-xr-x - maria_dev hdfs 0 2021-09-27 00:33 best_movies
drwxr-xr-x - maria_dev hdfs 0 2021-09-27 16:43 data
drwxr-xr-x - maria_dev hdfs 0 2021-09-26 21:30 mapreduce
drwxr-xr-x - maria_dev hdfs 0 2021-09-27 18:52 ml-latest-small
drwxr-xr-x - maria_dev hdfs 0 2021-09-26 21:42 pig
drwxr-xr-x - maria_dev hdfs 0 2021-09-27 02:58 temp
drwxr-xr-x - maria_dev hdfs 0 2021-09-26 12:53 tmp
Does my code exist in a diffrent path? why is this error occuring? please help. thanks.
CodePudding user response:
You do not have the required permissions on the code file to execute it via Spark.
run the following command
hdfs dfs -chmod 777 WorstMoviesSpark.py
then in your spark-submit command mention the master as yarn when running the code as follows
spark-submit --master yarn --deploy-mode client /hdfs/path/to/WorstMoviesSpark.py