Home > other >  Can't find directory spark using putty
Can't find directory spark using putty


I was trying to run a spark code named "WorstMoviesSpark" on hdfs using putty.

spark-submit WorstMoviesSpark.py

But when I typed the code above, it returned an error:

python: can't open file '/home/maria_dev/WorstMoviesSpark.py': [Errno 2] No such file or directory

So I typed:

hdfs dfs -ls

and the result was

Found 11 items
drwxr-xr-x   - maria_dev hdfs          0 2021-09-27 18:50 .Trash
drwx------   - maria_dev hdfs          0 2021-09-27 14:41 .staging
-rw-r--r--   1 admin     hdfs       1188 2021-09-27 18:52 WorstMoviesSpark.py
drwxr-xr-x   - maria_dev hdfs          0 2021-09-27 14:41 best_genre
drwxr-xr-x   - maria_dev hdfs          0 2021-09-27 00:33 best_movies
drwxr-xr-x   - maria_dev hdfs          0 2021-09-27 16:43 data
drwxr-xr-x   - maria_dev hdfs          0 2021-09-26 21:30 mapreduce
drwxr-xr-x   - maria_dev hdfs          0 2021-09-27 18:52 ml-latest-small
drwxr-xr-x   - maria_dev hdfs          0 2021-09-26 21:42 pig
drwxr-xr-x   - maria_dev hdfs          0 2021-09-27 02:58 temp
drwxr-xr-x   - maria_dev hdfs          0 2021-09-26 12:53 tmp

Does my code exist in a diffrent path? why is this error occuring? please help. thanks.

CodePudding user response:

You do not have the required permissions on the code file to execute it via Spark. run the following command hdfs dfs -chmod 777 WorstMoviesSpark.py then in your spark-submit command mention the master as yarn when running the code as follows

spark-submit --master yarn --deploy-mode client /hdfs/path/to/WorstMoviesSpark.py

  • Related