While I was reading a notebook in jupyter that does some pyspark job, encountered a code line saying
os.environ['PYSPARK_SUBMIT_ARGS'] = f'--name "test submit" --master yarn --deploy-mode client pyspark-shell'
I mostly understand this line but not the last argumnet pyaprk-shell
. So I googled PYSPARK_SUMIT_ARGS
to read full spec about this environment variable. Problem is that I couldn't find the documentation about it. All research results was saying to use it but not why and what it actually does. Couldn't find about it in the official documentation too.
I can assume it says to use pyspark(Python), not spark(R), to process my job, yet I want to read exactly how and what it does. So where can I read about it?
CodePudding user response:
Here's a link to the code for Spark-Submit. Look for PYSPARK_SHELL it's basically used to select what java class to use to run your code.