The spark application uses some API calls which do not use spark-session. I believe when the piece of code doesn't use spark it is getting executed on the master node!
Why do I want to know this? I am getting a java heap space error while I am trying to POST some files using API calls and I believe if I upgrade the master and increase driver mem it can be solved.
I want to understand how this type of application is executed on the Spark cluster? Is my understanding right or am I missing something?
CodePudding user response:
It depends - closures/functions passed to the built-in function transform
or any code in udfs you create, code in forEachBatch (and maybe a few other places) will run on the workers. Other code runs on driver